Understanding the MapR-DB OJAI Connector for Spark
Using the MapR-DB OJAI connector for Spark enables you build real-time and batch pipelines between your data and MapR-DB JSON. Before getting started, it is important that you understand Spark terminology and workflow, system requirements and support, and OJAI connector and API features.
The MapR-DB OJAI connector includes a set of APIs that enable you to write applications that consume MapR-DB JSON tables and use them in Spark. The MapR-DB OJAI Connector for Apache Spark is a companion to the MapR-DB Binary Connector for Apache Spark, which provides the equivalent functionality for MapR-DB Binary tables.
MapR-DB OJAI Connector with Spark Workflow
You can use the MapR-DB OJAI Connector to extract data from MapR-DB or MapR-FS and transform that data using either Spark or Spark SQL, and then load it into MapR-DB JSON:
MapR-DB OJAI Connector for Apache Spark Features
Principal features of the MapR-DB OJAI Connector for Apache Spark include the following:
- Support for Scala and, beginning with MEP 4.1, Java and Python APIs This matrix shows the programming languages and features supported:
Scala Java Python RDD Yes Yes No DataFrame Yes Yes Yes Dataset Yes Yes No DStream Yes No No - APIs that enable you to load data from a MapR-DB JSON table to an Apache Spark RDD, DataFrame, or Dataset
- Projection and filter pushdown for better performance
- Custom partitioner for RDDs that enables you to partition data for better performance
- APIs that save an Apache Spark RDD, DataFrame, or DStream to a MapR-DB JSON table using either normal or bulk insert
- Support for Scala and Java bean classes
- Support for data locality
The following features are not supported:
- MapR-DB Binary tables
Only MapR-DB JSON tables are supported; access to MapR-DB binary tables is provided through the MapR-DB Binary Connector.
- Secondary indexes
Supported Product Versions and System Requirements
To use the MapR-DB OJAI Connector for Apache Spark, you must have the following minimum software versions:
- MapR: 5.2.1 or later
- MEP 3.0 or later
- Spark 2.1.0 or later
- Scala 2.11 or later
- Java 8 or later
OJAI API
The MapR-DB OJAI Connector for Apache Spark uses the OJAI API internally to access MapR-DB JSON tables.