What's New in Data Science Refinery 1.3
Data Science Refinery 1.3 introduces new features and some changes in behavior for existing features from prior releases. This release requires that you connect to a 6.1.0 (or later) cluster.
New Features
The following are new features in this release:
- Interpreter Lifecycle Management
-
Prior to the 1.3 release, if a Zeppelin interpreter is idle and using excessive resources, you must either restart or kill the interpreter to reclaim resources. Starting in 1.3, Data Science Refinery terminates interpreters that have been idle for an hour. You can configure this timeout threshold. For details, see the Idle Interpreter Timeout Threshold in Zeppelin Docker Run Parameters (MapR 6.1.0).
- Helium Repository Browser
-
Starting with the 1.3 release, Data Science Refinery supports the Helium repository browser for enabling Zeppelin visualization packages. This provides a simpler procedure for enabling these packages. See Using Visualization Packages in Zeppelin (MapR 6.1.0) for detailed instructions.
- Configuration Storage
-
Starting with the Data Science Refinery 1.3 release, you can store certain Zeppelin configuration files in file system, which enables you to share them across multiple containers. For details, see Configuration Storage in Zeppelin Docker Run Parameters (MapR 6.1.0).
- Default Drill JDBC Connection String
-
Starting with Data Science Refinery 1.3, you can configure the default Drill JDBC connection URL. For more information, see Default Drill JDBC Connection URL in Zeppelin Docker Run Parameters (MapR 6.1.0).
- Building your own Docker Image
-
Starting with the 1.3 release, you can build your own custom Docker image of Data Science Refinery. See Building your own MapR Data Science Refinery Docker Image (MapR 6.1.0) for more information.
Changes in Existing Features
The following describe changes in behavior from prior releases:
- YARN Cluster Mode for Spark Interpreter Jobs
-
Prior to the 1.3 release, Spark interpreter jobs run in YARN client mode. The interpreter now runs in cluster mode. This mode reduces Spark resource utilization on the host machine of your Data Science Refinery container. See Understanding Zeppelin Interpreters (MapR 6.1.0) for details.
- Shared Livy Sessions
-
In prior releases, the Livy interpreter uses separate Livy sessions for Spark, PySpark, and SparkR jobs. Starting in the 1.3 release, it uses a shared Livy session to run all Spark variations. This reduces resource utilization in your cluster.
- Sequential Execution of Notebook Paragraphs
-
Starting with the 1.3 release, Data Science Refinery runs paragraphs in a notebook sequentially rather than in parallel. This allows paragraphs to run properly when they have dependencies on earlier paragraphs in the same notebook.
- Hive JDBC Interpreter and Secure Clusters
-
Starting with the 1.3 release, you must specify
ssl=true
in your Hive JDBC URL when connecting to a secure cluster. See Installing Custom Packages for PySpark Using Conda (MapR 6.1.0) for an example. - Python Versions with the Livy Interpreter
-
Starting with the 1.3 release, you no longer can run both Python 2 and Python 3 with the Livy interpreter. You can run only one or the other. By default, the interpreter runs Python 2. To switch to Python 3, see Python Version in Zeppelin Docker Run Parameters (MapR 6.1.0).
The limitation also applies if you are installing custom Python packages. See Installing Custom Packages for PySpark Using Conda (MapR 6.1.0) for instructions on how to install Python 2 vs Python 3 custom packages.
- Running Zeppelin as a Kubernetes Service
-
The
DEPLOY_MODE
parameter is your Kubernetes pod manifest file has been renamed toZEPPELIN_DEPLOY_MODE
. You can still useDEPLOY_MODE
, but Data Science Refinery 1.3 returns a warning, indicating the parameter is deprecated. See Running MapR Data Science Refinery as a Kubernetes Service (MapR 6.1.0) for an example of a pod manifest file. - Notebook Storage Using file system
-
Starting with the 1.3 release, to store your notebooks in file system, you no longer need to use the FUSE-based POSIX client. See Zeppelin Docker Run Parameters (MapR 6.1.0) for details.