Home
6.0 Development
This section contains information related to application development for ecosystem components and MapR products including MapR-DB (binary and JSON), MapR-FS, and MapR Streams.
Ecosystem Components
The following sections provide information about each open source project that MapR supports.
Spark

6.0 Development
This section contains information related to application development for ecosystem components and MapR products including MapR-DB (binary and JSON), MapR-FS, and MapR Streams.
- Application Development Process
  Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will be accessed.
- MapR-FS and Apps
  The following sections provide information about accessing MapR-FS with C and Java applications.
- MapR-DB and Apps
  This section contains information about developing client applications for JSON and binary tables.
- MapR-ES and Apps
  MapR-ES brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
- MapR Data Fabric for Kubernetes FlexVolume Driver
  This section describes how to use and troubleshoot the MapR Data Fabric for Kubernetes FlexVolume Driver.
- Ecosystem Components
  The following sections provide information about each open source project that MapR supports.
  - MapR Ecosystem Packs
    A MapR Ecosystem Pack (MEP) provides a set of ecosystem components that work together on one or more MapR cluster versions. Only one version of each ecosystem component is available in each MEP. For example, only one version of Hive and one version of Spark is supported in a MEP.
  - AsyncHBase
  - Cascading
  - Drill
  - Flume
  - HBase Client and MapR-DB Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Impala
  - MapR-ES Clients and Tools
  - Myriad
  - OpenStack Manila
  - Oozie
  - Pig
  - Sentry
  - Spark
    - Getting Started with Spark Interactive Shell
      After you have a basic understanding of Apache Spark and have it installed and running on your MapR cluster, you can use it to load datasets, apply schemas, and query data from the Spark interactive shell.
    - Spark Feature Support
      MapR supports most Spark features. However, there a few exceptions.
    - Spark Standalone
    - Spark on YARN
    - Spark on Mesos
    - Spark configure.sh
      Starting in the MEP 4.0 release, run configure.sh -R to complete your Spark configuration when manually installing Spark or upgrading to a new version.
    - Spark SQL Thrift Server
      Spark SQL Thrift (Spark Thrift) was developed from Apache Hive HiveServer2 and operates like HiveSever2 Thrift server.
    - Spark History Server SSL
      Describes how to enable SSL for Spark History Server.
    - MapR-DB Connectors for Apache Spark
      This section describes the MapR-DB connectors that you can use with Apache Spark.
    - Integrating Spark
      This section includes the following topics about configuring Spark to work with other ecosystem components.
    - Spark JDBC and ODBC Drivers
      MapR provides JDBC and ODBC drivers so you can write SQL queries that access the Apache Spark data processing engine. This section provides instructions on how to download the drivers, and install and configure them.
    - Spark API Changes
      This topic describes the public API changes that occurred for specific Spark versions.
    - Using Structured Streaming in Spark
      Starting in MEP 5.0.0, structured streaming is supported in Spark.
    - PAM Authentication for Spark
      Spark supports PAM authentication on secure MapR clusters.
    - Read or Write LZO Compressed Data for Spark
      This topic provides details for reading or writing LZO compressed data for Spark.
    - Ports Used by Spark
      To run a Spark job from a client node, ephemeral ports should be opened in the cluster for the client from which you are running the Spark job.
  - Sqoop
  - Third Party Solutions
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR supports public APIs for MapR-FS, MapR-DB, and MapR-ES. These APIs are available for application development purposes.

Spark

Apache Spark is an open-source processing engine that you can use to process Hadoop data. The following diagram shows the components involved in running Spark jobs. See Spark Cluster Mode Overview for further details on the different components.

MapR supports the following three types of cluster managers:

Spark's own standalone cluster manager
YARN
Apache Mesos

NOTE: Spark on Mesos is available starting in the Spark 2.1.0-1707 release.

The configuration and operational steps for Spark differ based on the Spark mode you choose to install. The steps to integrate Spark with other components are the same when using Standalone and YARN cluster mode, except where otherwise noted.

This section provides documentation about configuring and using Spark with MapR, but it does not duplicate the Apache Spark documentation.

You can also refer to additional documentation available on the Apache Spark Product Page.

(Topic last modified: 2019-01-25)