MapR Data Science Refinery

The MapR Data Science Refinery is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.

The MapR Data Science Refinery offers:

Access to All Platform Assets
The MapR FUSE-based POSIX Client allows app servers, web servers, and other client nodes and apps to read and write data directly and securely to a MapR cluster, like a Linux filesystem. In addition, connectors are provided for interacting with both MapR Database and MapR Event Store For Apache Kafka via Apache Spark connectors.
Superior Security
The MapR Platform provides enhanced security. Apache Zeppelin on MapR leverages and integrates with this security layer using the built-in capabilities provided by the MapR Persistent Application Container (PACC).
Apache Zeppelin is paired with the Helium framework to offer pluggable visualization capabilities.
Simplified Deployment
A preconfigured Zeppelin Docker container provides the ability to leverage MapR as a persistent data store.

Getting Started Using the Data Science Refinery with Zeppelin

You can deploy the Apache Zeppelin Docker container included in the Data Science Refinery on any of the following, listed in order of recommendation for best practice, starting with the most preferable option:

  • Container orchestration engines; for example: Docker Swarm, Kubernetes, OpenShift
  • Cloud instances
  • Shared edge node
  • Personal computers

Note: Starting in version 1.2, you can deploy the Data Science Refinery on a MapR cluster node. Make sure you take into consideration the resource requirements of the Data Science Refinery, if you choose this deployment mode.

If you are already familiar with Apache Zeppelin on MapR and want to skip to the deployment instructions, see Running the Zeppelin Container.