Configuring the Spark Interpreter

The Spark interpreter is available starting in the 1.1 release of the MapR Data Science Refinery. It provides support for Spark Python, SparkR, Basic Spark, and Spark SQL jobs. To use the Spark interpreter for these variations of Spark, you must take certain actions, including configuring Zeppelin and installing software on your MapR cluster.

You must also issue your docker run command with the parameters the Spark interpreter requires. See the following material for details about these parameters:

Spark Python

You must install Python in your MapR cluster to run Python code with the Spark interpreter. You do not need to install it in your container since Spark runs in YARN cluster mode.

To use Python in the Spark interpreter, specify the following in your notebook:
%spark.pyspark

By default, this invokes Python 2. To switch Python versions, see Python Version.

To install custom Python packages, see Installing Custom Packages for PySpark. This also describes how to use Python 3 with custom packages.

NOTE Although the 1.3 release includes IPython with the Spark interpreter, MapR Data Science Refinery does not support this feature.

SparkR

The Zeppelin container includes R. Some Apache SparkR jobs require you to install R on your MapR cluster nodes to run these jobs in the Spark interpreter.

To use R in the Spark interpreter, specify the following in your notebook:
%spark.r

Spark Jobs

By default, the Spark interpreter is configured to submit Apache Spark jobs in YARN client mode. The interpreter does not support YARN cluster mode. Make sure you follow the steps described at Installing Spark on YARN to install Spark on your MapR cluster.

To run Spark jobs in parallel, you must modify the Spark interpreter to instantiate Per Note:

You can set scoped to either of the two options.

Hive Tables

To access Apache Hive tables using the Spark interpreter, you must make the hive-site.xml configuration file from your Hive cluster available to Spark running in your Zeppelin container. Follow the same steps that describe how to access Hive tables with the Livy interpreter.