HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
- Application Development Process
  Before you start developing applications on the MapR Data Platform platform, consider how you will get the data into the platform, the storage format of the data, the type of processing or modeling that is required, and how the data will be accessed.
- MapR XD and Apps
  The following sections provide information about accessing the MapR XD with C and Java applications.
- MapR Database and Apps
  This section contains information about developing client applications for JSON and key-value tables.
- MapR Event Store For Apache Kafka and Apps
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to MapR Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery product is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
  - Zeppelin Docker Container on the MapR Data Platform
    The MapR Data Science Refinery product includes a preconfigured Apache Zeppelin notebook, packaged as a Docker container. Apache Zeppelin is an open source, Web-based data science notebook. You can use it with MapR components to conduct data discovery, ETL, machine learning, and data visualization.
    - Running the Zeppelin Container
      To run the Apache Zeppelin container, you must access the Zeppelin Docker image from the MapR Data Platform public repository, run the Docker image, and access the deployed container from your web browser. From your browser, you can create Zeppelin notebooks.
    - Understanding Zeppelin Interpreters
      Apache Zeppelin interpreters enable you to access specific languages and data processing backends. This section describes the interpreters you can use with the MapR system and the use cases they serve.
    - Configuring Zeppelin Interpreters
      Out-of-box, the interpreters in Apache Zeppelin on the MapR Data Platform are preconfigured to run against different backend engines. You may need to perform manual steps to configure the Livy, Spark, and JDBC interpreters. No additional steps are needed to configure and run the Pig and Shell interpreters. You can configure the idle timeout threshold for interpreters.
    - Troubleshooting Zeppelin
      This section describes how to resolve common problems you may encounter when using Apache Zeppelin.
    - Using Visualization Packages in Zeppelin
      Apache Zeppelin supports the Helium framework. Using visualization packages, you can view your data through area charts, bar charts, scatter charts, and other displays. To use a visualization package, you must enable it through the Helium repository browser in the Zeppelin UI. Like Zeppelin interpreters, Helium is automatically installed in your Zeppelin container.
    - Using Zeppelin to Access Different Backend Engines
      This section contains examples of how to use Apache Zeppelin interpreters to access the different backend engines. This includes running Apache Pig scripts, Apache Drill queries, Apache Hive queries, and Apache Spark jobs, as well as accessing MapR Database and MapR Event Store For Apache Kafka solutions.
      - Running Shell Commands in Zeppelin
        This section shows you how to access files in your local filesystem and the MapR File System by using shell commands in your Apache Zeppelin notebook.
      - Running Pig Scripts in Zeppelin
        This section contains a sample of an Apache Pig script that you can run in your Apache Zeppelin notebook.
      - Running Drill Queries in Zeppelin
        This section contains samples of Apache Drill queries that you can run in your Apache Zeppelin notebook.
      - Running Hive Queries in Zeppelin
        This section contains samples of Apache Hive queries that you can run in your Apache Zeppelin notebook.
      - Running Spark Jobs in Zeppelin
        This section contains code samples for different types of Apache Spark jobs that you can run in your Apache Zeppelin notebook. You can run these examples using either the Livy or Spark interpreter. The Spark interpreter is available starting in the 1.1 release of the MapR Data Science Refinery product.
      - Running MapR Database Shell Commands in Zeppelin
        This section contains a sample of MapR Database shell commands that you can run in your Apache Zeppelin notebook.
      - Accessing MapR Database in Zeppelin Using the MapR Database Binary Connector
        This section contains an example of an Apache Spark job that uses the MapR Database Binary Connector for Apache Spark to write and read a MapR Database Binary table. You can run this example using either the Livy or Spark interpreter. The Spark interpreter is available starting in the 1.1 release of the MapR Data Science Refinery product.
      - Accessing the MapR Database in Zeppelin Using the MapR Database OJAI Connector
        This section contains examples of Apache Spark jobs that use the MapR Database OJAI Connector for Apache Spark to read and write MapR Database JSON tables. The examples use the Spark Python interpreter. The Spark interpreter is available starting in the 1.1 release of the MapR Data Platform MapR Data Science Refinery product. The Python API in the MapR Database OJAI Connector is available starting in the EEP 4.1 release.
      - Accessing MapR Event Store For Apache Kafka in Zeppelin Using the Livy Interpreter
        This section contains a MapR Event Store For Apache Kafka streaming example that you can run in your Apache Zeppelin notebook using the Livy interpreter.
      - Accessing MapR Event Store For Apache Kafka in Zeppelin Using the Spark Interpreter
        This section contains a MapR Event Store For Apache Kafka streaming example that you can run in your Apache Zeppelin notebook using the Spark interpreter. The Spark interpreter is available starting in the 1.1 release of the MapR Data Science Refinery product.
    - Sharing Zeppelin Notebook Content
      By default, Zeppelin stores notebooks in the local filesystem in your container. An alternative is to store them in the MapR File System. This allows you to share the notebooks with other users.
  - Building your own MapR Data Science Refinery Docker Image
    MapR provides a preconfigured and prepackaged Docker image for the MapR Data Science Refinery. Starting with the1.3 release, you can build your own custom Docker image.
- MapR Data Fabric for Kubernetes
  This section describes how to leverage the capabilities of the MapR Data Fabric for Kubernetes.
- Ecosystem Components
  The following sections provide information about each open-source project that is supported by the MapR Data Platform.
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR Data Platform supports public APIs for MapR File System, MapR Database, and MapR Event Store For Apache Kafka. These APIs are available for application-development purposes.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Running Hive Queries in Zeppelin

This section contains samples of Apache Hive queries that you can run in your Apache Zeppelin notebook.

Prerequisites

Before running Hive queries, make sure you have configured the Hive JDBC interpreter. Also, see MapR Data Science Refinery Support by MapR Core Version for limitations when connecting to a secure MapR Data Platform 6.1 cluster.

Procedure

Using the shell interpreter, create a source data file:

%sh
cat > /tmp/test.data << EOF
John,Smith
Brian,May
Rodger,Taylor
John,Deacon
Max,Plank
Freddie,Mercury
Albert,Einstein
Fedor,Dostoevsky
Lev,Tolstoy
Niccolo,Paganini
EOF

Copy the file to the MapR File System:
To use POSIX shell commands like cp, you must have a MapR filesystem mount point in your container. The example below assumes your mount point is /mapr and your cluster name is my.cluster.com:
%sh cp /tmp/test.data /mapr/my.cluster.com/user/mapruser1
```
%sh
hadoop fs -put /tmp/test.data /user/mapruser1
```

Run the Hive code using the Hive JDBC interpreter:

%hive
-- create and load Hive table
create table test_hive(first_name string, last_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
load data inpath '/user/mapruser1/test.data' overwrite into table test_hive;
-- create and load Hive ORC table
create table test_hive_orc(first_name string, last_name string) stored as orc tblproperties ("orc.compress"="NONE");
insert overwrite table test_hive_orc  select * from test_hive;
-- query the Hive ORC table
select * from test_hive_orc;

The output looks like the following:

Drop the Hive tables created in the example:

%hive
drop table test_hive;
drop table test_hive_orc;