About Release 7.2
This site contains documentation for HPE Ezmeral Data Fabric release 7.2, including installation, configuration, administration, and reference content, as well as content for the associated ecosystem components and drivers.
7.2 Installation
This section contains information about installing and upgrading HPE Ezmeral Data Fabric software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a HPE Ezmeral Data Fabric cluster.
7.2 Data Fabric
HPE Ezmeral Data Fabric is the industry-leading data platform for AI and analytics that solves enterprise business needs.
7.2 Administration
This section describes how to manage the nodes and services that make up a cluster.
7.2 Development
This section contains information related to application development for Ezmeral ecosystem components and HPE Ezmeral Data Fabric products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the HPE Ezmeral Data Fabric platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- File Store and Apps
  The following sections provide information about accessing the File Store with C and Java applications.
- HPE Ezmeral Data Fabric Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- Apache Kafka Wire Protocol Service
  HPE Ezmeral Data Fabric Streams supports Apache Kafka Wire Protocol Service. Apache Kafka Wire Protocol Service is a TCP/IP service that emulates a Kafka cluster backed by HPE Ezmeral Data Fabric Streams. The service makes it possible for Apache Kafka clients written in any programming language to access topics in HPE Ezmeral Data Fabric Streams.
- HPE Ezmeral Data Fabric Streams and Apps
  HPE Ezmeral Data Fabric Streams brings integrated publish and subscribe messaging to HPE Ezmeral Data Fabric.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- Kubernetes Interfaces for Data Fabric
  This section describes how to leverage the capabilities of the Kubernetes Interfaces for Data Fabric.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the HPE Ezmeral Data Fabric.
  - Ecosystem Packs
  - Apache Airflow
    This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.
  - AsyncHBase
  - Cascading
  - Apache Drill
  - Hadoop
  - HBase
  - HBase Client and HPE Ezmeral Data Fabric Database Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Livy
    Apache Livy is primarily used to provide integration between Hue and Spark.
  - HPE Ezmeral Data Fabric Streams Clients and Tools
    Describes the supported HPE Ezmeral Data Fabric Streams tools and clients.
  - NiFi
    This topic provides an overview of Apache NiFi on HPE Ezmeral Data Fabric.
  - Ranger
  - Apache Spark
  - YARN
  - Zeppelin
    - Configuring Zeppelin Interpreters
      Out-of-box, the interpreters in Apache Zeppelin on the HPE Ezmeral Data Fabric are preconfigured to run against different backend engines. You may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. You can configure the idle timeout threshold for interpreters.
    - Cloning the Zeppelin Interpreter
      Describes how to change interpreter settings for different notebooks.
    - Zeppelin Multiuser and Multi-Instance Support
      Describes support for multiple users and multiple instances of the Zeppelin package-based product.
    - Configuring Impersonation in Zeppelin
      Impersonation for Apache Zeppelin is enabled and configured through the user interface for each interpreter. The following provides details for performing these configuration functions.
    - Enabling Kerberos Security for Zeppelin
      Describes how to set the principal and keytab properties for the Zeppelin server and configure interpreters to enable Kerberos for your Zeppelin installation.
    - Using Zeppelin to Access Different Backend Engines
      Contains links to examples for how to use Apache Zeppelin interpreters to access different backend engines. This includes running Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing database and streaming solutions.
    - Configuring Conda Python for Zeppelin
      Describes how to configure Conda Python for Zeppelin.
- Maven and the HPE Ezmeral Data Fabric
  This section discusses topics associated with Maven and the HPE Ezmeral Data Fabric.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  HPE Ezmeral Data Fabric supports public APIs for file system, HPE Ezmeral Data Fabric Database, and HPE Ezmeral Data Fabric Streams. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other data-fabric version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Configuring Conda Python for Zeppelin

Describes how to configure Conda Python for Zeppelin.

The following steps assume that the miniconda distribution of Conda Python is already installed. For more information see the Conda documentation.

Use these steps:

Create a Conda zip archive containing Python and all the libraries that you need.
The following example creates a custom Conda environment with Python 2 and three packages (matplotlib, numpy, and pandas):
```
mkdir custom_pyspark_env
conda create -p ./custom_pyspark_env python=2 numpy pandas matplotlib
cd custom_pyspark_env
zip -r custom_pyspark_env.zip ./
```
The following example creates a custom Conda environment with Python 3 and three packages (matplotlib, numpy, and pandas):
```
mkdir custom_pyspark3_env
conda create -p ./custom_pyspark3_env python=3 numpy pandas matplotlib
cd custom_pyspark3_env
zip -r custom_pyspark3_env.zip ./
```
IMPORTANT Do not create an archive named pyspark.zip. This name is reserved for PySpark internals.
Upload the archive to the data-fabric file system. For example, if the archive name is custom_pyspark_env.zip, and you want to put the archive in a directory that all users can read:
```
hadoop fs -mkdir /apps/zeppelin  
hadoop fs -put custom_pyspark_env.zip /apps/zeppelin
```
Add the full path (including maprfs:// schema) to the archive into spark.yarn.dist.archive, and configure the Spark / Livy interpreter to use Python from this distribution.
Note that all archives listed in the property will be extracted into a working directory of YARN application.
For the Spark interpreter, set the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables (it can be done by configuring Spark interpreter):
For the Livy interpreter, set the livy.spark.yarn.appMasterEnv.PYSPARK_PYTHON property: