What's New in Data Science Refinery 1.3

Data Science Refinery 1.3 introduces new features and some changes in behavior for existing features from prior releases. This release requires that you connect to a 6.1.0 (or later) cluster.

New Features

The following are new features in this release:

Interpreter Lifecycle Management

Prior to the 1.3 release, if a Zeppelin interpreter is idle and using excessive resources, you must either restart or kill the interpreter to reclaim resources. Starting in 1.3, Data Science Refinery terminates interpreters that have been idle for an hour. You can configure this timeout threshold. For details, see the Idle Interpreter Timeout Threshold in Zeppelin Docker Run Parameters (MapR 6.1.0).

Helium Repository Browser

Starting with the 1.3 release, Data Science Refinery supports the Helium repository browser for enabling Zeppelin visualization packages. This provides a simpler procedure for enabling these packages. See Using Visualization Packages in Zeppelin (MapR 6.1.0) for detailed instructions.

Configuration Storage

Starting with the Data Science Refinery 1.3 release, you can store certain Zeppelin configuration files in file system, which enables you to share them across multiple containers. For details, see Configuration Storage in Zeppelin Docker Run Parameters (MapR 6.1.0).

Default Drill JDBC Connection String

Starting with Data Science Refinery 1.3, you can configure the default Drill JDBC connection URL. For more information, see Default Drill JDBC Connection URL in Zeppelin Docker Run Parameters (MapR 6.1.0).

Building your own Docker Image

Starting with the 1.3 release, you can build your own custom Docker image of Data Science Refinery. See Building your own MapR Data Science Refinery Docker Image (MapR 6.1.0) for more information.

Changes in Existing Features

The following describe changes in behavior from prior releases:

YARN Cluster Mode for Spark Interpreter Jobs

Prior to the 1.3 release, Spark interpreter jobs run in YARN client mode. The interpreter now runs in cluster mode. This mode reduces Spark resource utilization on the host machine of your Data Science Refinery container. See Understanding Zeppelin Interpreters (MapR 6.1.0) for details.

Shared Livy Sessions

In prior releases, the Livy interpreter uses separate Livy sessions for Spark, PySpark, and SparkR jobs. Starting in the 1.3 release, it uses a shared Livy session to run all Spark variations. This reduces resource utilization in your cluster.

Sequential Execution of Notebook Paragraphs

Starting with the 1.3 release, Data Science Refinery runs paragraphs in a notebook sequentially rather than in parallel. This allows paragraphs to run properly when they have dependencies on earlier paragraphs in the same notebook.

Hive JDBC Interpreter and Secure Clusters

Starting with the 1.3 release, you must specify ssl=true in your Hive JDBC URL when connecting to a secure cluster. See Installing Custom Packages for PySpark Using Conda (MapR 6.1.0) for an example.

Python Versions with the Livy Interpreter

Starting with the 1.3 release, you no longer can run both Python 2 and Python 3 with the Livy interpreter. You can run only one or the other. By default, the interpreter runs Python 2. To switch to Python 3, see Python Version in Zeppelin Docker Run Parameters (MapR 6.1.0).

The limitation also applies if you are installing custom Python packages. See Installing Custom Packages for PySpark Using Conda (MapR 6.1.0) for instructions on how to install Python 2 vs Python 3 custom packages.

Running Zeppelin as a Kubernetes Service

The DEPLOY_MODE parameter is your Kubernetes pod manifest file has been renamed to ZEPPELIN_DEPLOY_MODE. You can still use DEPLOY_MODE, but Data Science Refinery 1.3 returns a warning, indicating the parameter is deprecated. See Running MapR Data Science Refinery as a Kubernetes Service (MapR 6.1.0) for an example of a pod manifest file.

Notebook Storage Using file system

Starting with the 1.3 release, to store your notebooks in file system, you no longer need to use the FUSE-based POSIX client. See Zeppelin Docker Run Parameters (MapR 6.1.0) for details.