HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the MapR Data Platform platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- MapR XD and Apps
  The following sections provide information about accessing the MapR XD with C and Java applications.
- MapR Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- MapR Event Store For Apache Kafka and Apps
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to MapR Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery product is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
- MapR Data Fabric for Kubernetes
  This section describes how to leverage the capabilities of the MapR Data Fabric for Kubernetes.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the MapR Data Platform.
  - Ezmeral Ecosystem Packs
  - AsyncHBase
  - Cascading
  - Apache Drill
  - Flume
  - HBase
  - HBase Client and MapR Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Impala
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - MapR Event Store For Apache Kafka Clients and Tools
    Describes the supported MapR Event Store For Apache Kafka tools and clients.
  - S3 Gateway
    The S3 gateway is a service that provides an S3-compatible interface to expose data in MapR Data Platform as objects. The S3 gateway manages all inbound S3 API requests to put data into and get data out of cloud storage.
  - Myriad
  - Oozie
  - Pig
  - Sentry
  - Apache Spark
  - Sqoop
  - YARN
  - Zeppelin
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR Data Platform supports public APIs for MapR File System, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

New API in Pig 0.16.0

IMPORTANT This component is deprecated. Hewlett Packard Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.

Pig 0.16.0 includes the following new classes and interfaces.

New Classes

Class	Description
org.apache.pig.piggybank.evaluation.string.REPLACE_MULTI	REPLACE_MULTI implements eval function to replace all occurrences of search keys with replacement values. Replacement values are specified in Map. For example: `input_data = LOAD 'input_data' as (name); -- name = 'Hello World!' replaced_name = FOREACH input_data \ GENERATE REPLACE_MULTI ( name, [ ''#'_', '!'#'', 'e'#'a', 'o'#'oo' ] ); -- replaced_name = Halloo_Woorld` The first argument is the source string on which `REPLACE_MULTI` operation is performed. The second argument is a map having search key with replacement value pairs.
org.apache.pig.piggybank.storage.apachelog.LogFormatLoader	This is a pig loader that can load Apache HTTPD access logs written in (almost) any Apache HTTPD LogFormat. Basic usage: Feed the loader your (custom) logformat specification and it will show the fields that can be extracted from this logformat.
org.apache.pig.CounterBasedErrorHandler	Handles errors thrown by the `StoreFuncInterface.putNext()`.
org.apache.pig.backend.hadoop.HKerberos	Support for logging in using a Kerberos keytab file. Kerberos is an authentication system that uses tickets with limited validity time. Running a Pig script on a Kerberos secured Hadoop cluster limits the running time to at most the remaining validity time of the Kerberos tickets. When doing really complex analytics, this may become a problem as the job may need to run for a longer time than these ticket times allow. A Kerberos keytab file is a Kerberos specific form of the password of a user. It is possible to enable a Hadoop job to request new tickets when they expire by creating a keytab file and making it part of the job that is running in the cluster. This will extend the maximum job duration beyond the maximum renew time of the Kerberos tickets.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigWritableComparators	Byte only raw comparators for faster comparison for non-orderby jobs. This does not reuse `JobControlCompiler.Pig<DataType>WritableComparator`, which extends `PigWritableComparator`. The `PigNullablePartitionWritable.compare` is not that efficient in cases where tuple is iterated for null checking instead of taking advantage of `TupleRawComparator.hasComparedTupleNull()`. This also skips multi-query index checking.
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator	This class is used to decorate the `StoreFunc#putNext(Tuple)`. It handles errors by calling `OutputErrorHandler#handle(String, long, Throwable)` if the `StoreFunc` implements `ErrorHandling`.
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigInputFormatTez	Extends `org.apache.hadoop.mapreduce.InputFormat` and implements Pig and Tez specific functions.
org.apache.pig.backend.hadoop.executionengine.tez.util.TezUDFContextSeparator	Extends a visitor for the TezOperPlan class and serializes all (`LoadFunc`, `StoreFunc`, `UserFunc`).
org.apache.pig.impl.io.compress.BZip2CodecWithExtensionBZ	For historical reasons, Pig supports `.bz` and `.bz2` for `bzip2` extension. This class returns the additional `bzip2` file extension, `.bz`, as a string.
org.apache.pig.impl.util.UDFContextSeparator	`TezUDFContextSeparator` extends `PhyPlanVisitor`, which is the visitor class for the Physical Plan. To use this, create the visitor with the plan to be visited. Call the `visit()` method to traverse the plan in a depth first fashion. This class also visits the nested plans inside the operators. Extend this class to modify the nature of each visit and to maintain any relevant state information between the visits to two different operators.
org.apache.pig.parser.RegisterResolver	Resolves a JAR with a scripting language or namespace.
org.apache.pig.tools.DownloadResolver	Makes a list of URIs of the downloaded JARs.

New Interfaces

Interface	Description
org.apache.pig.ErrorHandler	The interface that handles errors thrown by `StoreFuncInterface.putNext(Tuple)`.
org.apache.pig.ErrorHandling	The interface to enable handling of errors during `StoreFunc#putNext(Tuple)`.