HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the MapR Data Platform platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- MapR XD and Apps
  The following sections provide information about accessing the MapR XD with C and Java applications.
- MapR Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- MapR Event Store For Apache Kafka and Apps
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to MapR Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery product is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
  - Zeppelin Docker Container on the MapR Data Platform
    The MapR Data Science Refinery product includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache Zeppelin is an open source, Web-based data science notebook. You can use it with MapR components to conduct data discovery, ETL, machine learning, and data visualization.
    - Running the Zeppelin Container
      To run the Apache Zeppelin container, you must access the Zeppelin Docker image from the MapR Data Platform public repository, run the Docker image, and access the deployed container from your web browser. From your browser, you can create Zeppelin notebooks.
    - Understanding Zeppelin Interpreters
      Apache Zeppelin interpreters enable you to access specific languages and data processing backends. This section describes the interpreters you can use with the MapR system and the use cases they serve.
    - Configuring Zeppelin Interpreters
      Out-of-box, the interpreters in Apache Zeppelin on the MapR Data Platform are preconfigured to run against different backend engines. You may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. No additional steps are needed to configure and run the Pig and Shell interpreters. You can configure the idle timeout threshold for interpreters.
    - Troubleshooting Zeppelin
      This section describes how to resolve common problems you may encounter when using Apache Zeppelin.
    - Using Visualization Packages in Zeppelin
      Apache Zeppelin supports the Helium framework. Using visualization packages, you can view your data through area charts, bar charts, scatter charts, and other displays. To use a visualization package, you must enable it through the Helium repository browser in the Zeppelin UI. Like Zeppelin interpreters, Helium is automatically installed in your Zeppelin container.
    - Using Zeppelin to Access Different Backend Engines
      This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing MapR Database and MapR Event Store For Apache Kafka solutions.
    - Sharing Zeppelin Notebook Content
      By default, Zeppelin stores notebooks in the local filesystem in your container. An alternative is to store them in the MapR File System. This allows you to share the notebooks with other users.
  - Building your own MapR Data Science Refinery Docker Image
    MapR provides a preconfigured and prepackaged Docker image for the MapR Data Science Refinery. Starting with the1.3 release, you can build your own custom Docker image.
- MapR Data Fabric for Kubernetes
  This section describes how to leverage the capabilities of the MapR Data Fabric for Kubernetes.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the MapR Data Platform.
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR Data Platform supports public APIs for MapR File System, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Zeppelin Docker Container on the MapR Data Platform

The MapR Data Science Refinery product includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache Zeppelin is an open source, Web-based data science notebook. You can use it with MapR components to conduct data discovery, ETL, machine learning, and data visualization.

You can run the Zeppelin container either on your laptop or on MapR edge nodes. Out of box, the Zeppelin container image is integrated with open source data processing engines like Apache Spark, Apache Drill, and Apache Hive, as well as with native MapR engines (MapR File System, MapR Database, and MapR Event Store For Apache Kafka). Using the notebook simply requires running the Docker image and connecting to the container through your browser.

Zeppelin provides the following benefits for your data engineering and data science use cases:

An interactive development environment for writing, testing, and sharing data processing code snippets
The ability to run the notebooks in a local client environment, such as on a laptop
Support for a variety of interpreters for integrating with different backend components
Support for extensible visualization libraries

The Zeppelin notebook included with the MapR Data Science Refinery product provides additional benefits:

A small footprint, pre-built, certified data science container that is easy to deploy and run
An isolated environment where you can experiment with libraries and packages without affecting other users' work
Secure authentication at the container level across a secure Web connection
Preconfigured JDBC interpreters for accessing query engines like Apache Drill and Apache Hive
The MapR Data Platform FUSE-Based POSIX Client, which you need to access File System using shell commands
All client side services that you need to submit Apache Spark jobs, including jobs that access MapR Event Store For Apache Kafka
MapR connectors, which you need to access MapR Database (both binary and JSON tables)

See Zeppelin Release Notes for release specific information.

For additional information about Zeppelin, you can also refer to the open source documentation.