Saving an Apache Spark DStream to a MapR Database JSON Table
The MapR Database OJAI Connector for Apache Spark enables you to use MapR Database as a sink for Apache Spark DStreams.
NOTE Saving of Apache Spark DStream to MapR Database JSON table is currently only supported in
Scala.
The following API saves a DStream[OJAIDocument]
object to a MapR Database
table:
def saveToMapRDB(tablename: String, createTable: Boolean,
bulkInsert: Boolean, idFieldPath: String): Unit
The parameters are as follows:
Parameter | Default | Description |
tableName |
Not applicable | The name of the MapR Database table to which you are saving the DStream. |
createTable |
false |
Creates the table before saving the DStream. Note that if the table already exists
and createTable is set to true, the API throws an exception. |
idFieldPath |
_id |
Specifies the key to be used for the DStream. |
bulkInsert |
false |
Loads a group of streams simultaneously. bulkInsert is similar to
a bulk load in MapReduce. |
NOTE The only required parameter for this function is tableName. All the others are optional.
The following example creates a DStream object, converts it to a
DStream[OJAIDocument]
object, and then stores it in
MapR Database:
val clicksStream: DStream[String] = createKafkaStream(…)
clicksStream.map(MapRDBSpark.newDocument()).saveToMapRDB("/clicks", createTable=true)
NOTE You must use the
map(MapRDBSpark.newDocument())
API to convert the DStream
object to a DStream[OJAIDocument]
object.If
clicksStream
is a DStream of Strings, it can be saved to MapR Database using
the saveToMapRDB
API:
clicksStream.map(MapRDBSpark.newDocument(_)).saveToMapRDB("/clicks", createTable = true);
NOTE To use the
saveToMapRDB
API, you need to transform the DStream object to a
DStream[OJAIDocument]
by using the Apache Spark Map API.