HPE Ezmeral Data Fabric 6.2 is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About Release 6.2
This site contains documentation for HPE Ezmeral Data Fabric release 6.2 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
6.2 Installation
This section contains information about installing and upgrading HPE Ezmeral Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
6.2 Data Fabric
HPE Ezmeral Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.2 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.2 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Ezmeral Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the HPE Ezmeral Data Fabric platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- File Store and Apps
  The following sections provide information about accessing the File Store with C and Java applications.
- HPE Ezmeral Data Fabric Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- HPE Ezmeral Data Fabric Streams and Apps
  HPE Ezmeral Data Fabric Streams brings integrated publish and subscribe messaging to HPE Ezmeral Data Fabric.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- Kubernetes Interfaces for Data Fabric
  This section describes how to leverage the capabilities of the Kubernetes Interfaces for Data Fabric.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the HPE Ezmeral Data Fabric.
  - Ezmeral Ecosystem Packs
  - Apache Airflow
    This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.
  - AsyncHBase
  - Cascading
  - Apache Drill
    - Drill Tutorial
    - Drill-on-YARN
    - Configuring Drill
      Lists the data-fabric-specific configuration for Drill.
    - Working with Drill
    - Securing Drill
      An administrator can install Drill with the default security configuration or manually configure custom security for Drill.
    - Drill Drivers
      HPE Ezmeral Data Fabric provides Drill ODBC and JDBC drivers that you can download and use to connect Drill to BI tools. The drivers are updated periodically to include support for new functionality in Drill.
    - Drill Configuration Files
      The Drill installation includes configuration files with start-up options that you can modify prior to starting Drill.
    - Monitoring Drill Metrics
    - Optimizing Queries with Indexes
      HPE Ezmeral Data Fabric Database provides a highly scalable key-value database platform on which you can run SQL queries using Drill. As of the 6.0 release of the MapR Data Platform, HPE Ezmeral Data Fabric Database natively supports indexes on secondary fields in JSON tables.
    - Drill Limitations
      Provides information about Drill limitations and solutions where applicable.
    - Vulnerability Reports
      Provides vulnerability information in relation to Drill.
  - Flume
  - Hadoop
  - HBase
  - HBase Client and HPE Ezmeral Data Fabric Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Impala
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - HPE Ezmeral Data Fabric Streams Clients and Tools
    Describes the supported HPE Ezmeral Data Fabric Streams tools and clients.
  - S3 Gateway
    The S3 gateway is a service that provides an S3-compatible interface to expose data in HPE Ezmeral Data Fabric as objects. The S3 gateway manages all inbound S3 API requests to put data into and get data out of cloud storage.
  - Oozie
  - Pig
  - Sentry
  - Apache Spark
  - Sqoop
  - YARN
- Maven and the HPE Ezmeral Data Fabric
  This section discusses topics associated with Maven and the HPE Ezmeral Data Fabric.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  HPE Ezmeral Data Fabric supports public APIs for file system, HPE Ezmeral Data Fabric Database, and HPE Ezmeral Data Fabric Streams. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other data-fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Apache Drill

Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google’s Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.

Drill includes a distributed environment, purpose built for large-scale data processing. At the core of Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client.

Installing Drill

You can install Drill on one node or multiple nodes in a cluster. When Drill runs on each data node in a cluster, Drill can maximize data locality without moving data over the network or between nodes. Drill uses ZooKeeper to maintain cluster membership and health check information.

See Installing Drill for instructions and additional information.

Configuring Data Source Connections

Drill connects to data sources through storage plugins. Drill can connect to several types of data sources including databases, local or distributed filesystems, and Hive metastores.

See Connecting Drill to Data Sources and Connect a Data Source for instructions and additional information.

Accessing Drill

After you install Drill and configure connections to your data sources, you can access Drill from any of the following user interfaces:

Additional Resources

Drill documentation is accessible from following the locations: