Saving an Apache Spark DStream to a MapR Database JSON Table

The MapR Database OJAI Connector for Apache Spark enables you to use MapR Database as a sink for Apache Spark DStreams.

NOTE Saving of Apache Spark DStream to MapR Database JSON table is currently only supported in Scala.

The following API saves a DStream[OJAIDocument] object to a MapR Database table:

def saveToMapRDB(tablename: String, createTable: Boolean,
          bulkInsert: Boolean, idFieldPath: String): Unit

The parameters are as follows:

Parameter Default Description
tableName Not applicable The name of the MapR Database table to which you are saving the DStream.
createTable false Creates the table before saving the DStream. Note that if the table already exists and createTable is set to true, the API throws an exception.
idFieldPath _id Specifies the key to be used for the DStream.
bulkInsert false Loads a group of streams simultaneously. bulkInsert is similar to a bulk load in MapReduce.
NOTE The only required parameter for this function is tableName. All the others are optional.
The following example creates a DStream object, converts it to a DStream[OJAIDocument] object, and then stores it in MapR Database:
val clicksStream: DStream[String] = createKafkaStream(…)
clicksStream.map(MapRDBSpark.newDocument()).saveToMapRDB("/clicks", createTable=true)
NOTE You must use the map(MapRDBSpark.newDocument()) API to convert the DStream object to a DStream[OJAIDocument] object.
If clicksStream is a DStream of Strings, it can be saved to MapR Database using the saveToMapRDB API:
clicksStream.map(MapRDBSpark.newDocument(_)).saveToMapRDB("/clicks", createTable = true);
NOTE To use the saveToMapRDB API, you need to transform the DStream object to a DStream[OJAIDocument] by using the Apache Spark Map API.