Understanding Zeppelin Interpreters

Apache Zeppelin interpreters enable you to access specific languages and data processing backends. This section describes the interpreters you can use with the MapR system and the use cases they serve.

Supported Zeppelin Interpreters

Apache Zeppelin on the MapR Data Platform supports the following interpreters:

Shell

With the Shell interpreter, you can invoke system shell commands. If you have a MapR File System mount point, you can access the MapR File System using shell commands like ls and cat by using the FUSE-Based POSIX Client. See Running Shell Commands in Zeppelin for examples that use this interpreter.

Pig

The Apache Pig interpreter enables you to run Apache Pig scripts and queries. See Running Pig Scripts in Zeppelin for examples that use this interpreter.

JDBC - Drill and Hive

Apache Zeppelin on the MapR system provides preconfigured Apache Drill and Apache Hive JDBC interpreters. See Running Drill Queries in Zeppelin and Running Hive Queries in Zeppelin for examples that use these interpreters.

Livy

The Apache Livy interpreter is a RESTful interface for interacting with Apache Spark. With this interpreter, you can run interactive Scala, Python, and R shells, and submit Spark jobs.

The Spark jobs run in YARN cluster mode so they run inside an application master process managed by YARN. This has the following implications:

  • Allows you to close your Zeppelin notebook without killing your Spark jobs.
  • Supports Spark Dynamic Resource Allocation, which allows you to set idle timeouts in your Spark context to recapture wayward memory.

Starting in the 1.3 release, MapR Data Science Refinery uses a shared Livy session to run all Spark variations. In prior releases, it uses separate Livy sessions for Spark, PySpark, and SparkR jobs.

The Livy interpreter does not support ZeppelinContext and Angular Display System. See the description of the Spark interpreter for details about these features.

Although MapR Data Science Refinery includes Livy, you cannot run the Livy UI inside your Zeppelin container.

The following topics contain examples that use the Livy interpreter to access different backend engines:

Spark

The Apache Spark interpreter is available starting in MapR Data Science Refinery 1.1. It provides an alternative to the Livy interpreter.

The Spark interpreter supports the following features not supported by the Livy interpreter:
  • ZeppelinContext - Allows you to create dynamic forms and share objects between Spark Scala and PySpark code
  • Angular Display System - Allows you to display charts using data returned from Spark and to pass variables from the Spark interpreter to the Angular interpreter

Starting in MapR Data Science Refinery 1.3, Spark jobs run in YARN cluster mode. In prior releases, they run in YARN client mode. Running in YARN cluster mode avoids heavy resource consumption on your container host machine because the Spark driver process does not run on the container host. It also provides the advantages noted earlier for the Livy interpreter.

The following topics contain examples that use the Spark interpreter to access different backend engines:

MapR Database Shell

The MapR Database Shell interpreter allows you to run commands available in MapR Database Shell (JSON Tables) in the Zeppelin UI. Using dbshell commands, you can access MapR Database JSON tables without having to write Spark code. The interpreter supports all dbshell commands except find commands that specify an ordering.

The interpreter is available starting in MapR Data Science Refinery 1.2. You do not have to run any new additional configuration steps to use this interpreter.

Specify the following in the Zeppelin UI to invoke the interpreter:
%maprdb

See Running MapR Database Shell Commands in Zeppelin for examples that use this interpreter.

Livy vs Spark Interpreters

Starting in MapR Data Science Refinery 1.3, since both the Livy and Spark interpreters run in YARN cluster mode, the primary reason for choosing the Spark interpreter over the Livy interpreter is support for visualization features in the former.

NOTE Neither interpreter supports Spark standalone mode.

Zeppelin Interpreter Use Cases

The table below summarizes which interpreters to use to access different backend engines for different data processing goals:

Data Processing Goal Zeppelin Interpreter Backend Engine
Data discovery, exploratory querying Livy, Spark Spark SQL
JDBC Hive, Drill
Shell MapR File System
MapR Database Shell MapR Database JSON
ETL, preparation Livy, Spark Spark, PySpark, SparkSQL, SparkStreaming,
Livy, Spark MapR Database (through the MapR Database Connectors for Apache Spark)
Livy, Spark MapR Event Store For Apache Kafka (through Spark jobs that query MapR Event Store For Apache Kafka)
NOTE See MapR Data Science Refinery Support by MapR Core Version for limitations in version support when accessing MapR Event Store.
JDBC Hive
Pig MapReduce
Machine and deep learning, data science Livy, Spark SparkML
Reporting, visualization JDBC Hive, Drill

The following are general guidelines for choosing between the Livy and Spark interpreters:

  • Use Livy for jobs that are long running or resource intensive
  • Use Spark if you use visualization features that Livy does not support

Unsupported Zeppelin Interpreters

Apache Zeppelin on the MapR does not support the HBase interpreter. To access MapR Database binary tables, use the MapR Database Binary Connector for Apache Spark with either the Livy or Spark interpreter.

Sequential Execution of Notebook Paragraphs

Starting in the 1.3 release, MapR Data Science Refinery runs paragraphs in a notebook sequentially rather than in parallel. This allows paragraphs to run properly when they have dependencies on earlier paragraphs in the same notebook.