HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the MapR Data Platform platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- MapR XD and Apps
  The following sections provide information about accessing the MapR XD with C and Java applications.
- MapR Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- MapR Event Store For Apache Kafka and Apps
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to MapR Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery product is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
- MapR Data Fabric for Kubernetes
  This section describes how to leverage the capabilities of the MapR Data Fabric for Kubernetes.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the MapR Data Platform.
  - Ezmeral Ecosystem Packs
  - AsyncHBase
  - Cascading
  - Apache Drill
    - Drill Tutorial
    - Drill-on-YARN
    - Configuring Drill
      Lists the MapR-specific configuration for Drill.
    - Working with Drill
      - Connecting Drill to Data Sources
        Choose and configure storage plugins to enable Drill to connect to a data source.
        Drill Storage and Format Plugin Support Matrix
        maprdb Format Plugin for Drill
        Drill supports access to MapR Database JSON and binary tables through the maprdb format plugin.
        Configuring the Hive Storage Plugin
        Configuring the Kafka Storage Plugin
        The Kafka storage plugin is not officially supported for Drill; however, if you choose to configure Kafka as a data source in Drill, you must update the <drill_home>/jars/3rdParty directory such that it contains the required JAR files and then restart Drill before you configure the kafka storage plugin in the Drill Web UI.
      - Start the Drill Web UI
        The Drill Web UI is one of several client interfaces that you can use to access Drill.
      - Start the Drill Shell (SQLLine)
        SQLLine is a JDBC application packaged with Drill that serves as the Drill shell. When you issue queries from the SQLLine, the SQLLine client sends the queries to the connected Drillbit (Drill node).
      - Hive to Drill Type Mapping
    - Securing Drill
      An administrator can install Drill with the default security configuration or manually configure custom security for Drill.
    - Drill Drivers
      HPE Ezmeral Data Fabric provides Drill ODBC and JDBC drivers that you can download and use to connect Drill to BI tools. The drivers are updated periodically to include support for new functionality in Drill.
    - Drill Configuration Files
      The Drill installation includes configuration files with start-up options that you can modify prior to starting Drill.
    - Monitoring Drill Metrics
    - Optimizing Queries with Indexes
      MapR Database provides a highly scalable key-value database platform on which you can run SQL queries using Drill. As of the 6.0 release of the MapR Data Platform, MapR Database natively supports indexes on secondary fields in JSON tables.
    - Drill Limitations
      Provides information about Drill limitations and solutions where applicable.
    - Vulnerability Reports
      Provides vulnerability information in relation to Drill.
  - Flume
  - HBase
  - HBase Client and MapR Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Impala
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - MapR Event Store For Apache Kafka Clients and Tools
    Describes the supported MapR Event Store For Apache Kafka tools and clients.
  - S3 Gateway
    The S3 gateway is a service that provides an S3-compatible interface to expose data in MapR Data Platform as objects. The S3 gateway manages all inbound S3 API requests to put data into and get data out of cloud storage.
  - Myriad
  - Oozie
  - Pig
  - Sentry
  - Apache Spark
  - Sqoop
  - YARN
  - Zeppelin
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR Data Platform supports public APIs for MapR File System, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Drill Storage and Format Plugin Support Matrix

You can deploy Drill without Hadoop in a standalone configuration on a single node, however multi-node standalone cluster deployments of Drill are not supported. Note that Drill itself does not require Hadoop.

The following table lists the supported and unsupported data sources and formats in Drill:

Data Source	Storage Plugin Type	Formats	Supported
MapR File System	dfs	Text (CSV, TSV, PSV)	Yes
		Parquet	Yes
		JSON	Yes
		Avro	No
MapR Database	dfs	Binary	Yes
		JSON	Yes
HBase	hbase	Binary	No (as of Drill 1.11 and MapR 6.0)
Hive	hive	Text (CSV, TSV, PSV)	Yes
		Parquet	Yes
		JSON	Yes
		Avro	Yes
		Other Hive built-in SerDes	Yes (Not recommended due to the memory overhead and performance implications.)
S3	s3	Supports the same formats as the dfs storage plugin.	Yes
MongoDB	mongodb	N/A	No
RDBMS	jdbc	N/A	No
Kudu	kudu	N/A	No
Kafka	kafka	JSON	No NOTE The kafka storage plugin on the MapR Streams is in the Alpha testing phase and not officially supported. See Configuring the Kafka Storage Plugin for more information.
OpenTSDB	openTSDB	N/A	NOTE The openTSDB storage plugin is not officially supported. See OpenTSDB Storage Plugin for more information.

NOTE As of the MapR 6.0 and Drill 1.11, HBase is no longer supported, therefore the communication path between Drill and HBase is also not supported. If you have an hbase storage plugin configured in Drill, you should disable it.