Step 2: Write Data to MapR

Depending on your use case, move existing data onto the platform or write data directly to the platform.

You can write batch data or streaming data to the MapR Converged Data Platform. Batch data refers to data that is already in a data-store while streaming data refers to the continuous flow of real-time messages that have yet to be written to a data-store. Streaming data is generally processed as it is received while batch data is processed after a set of data is written to the datastore. There are many ways to write batch and streaming data to the platform, the following sections provide a few examples.

Write Batch Data to the Platform

You can use an NFS client, hadoop command, or ecosystem components to write batch data to MapR-FS. Basic POSIX file system operations can be used to move data to MapR-FS. For example, you can use NFS clients, POSIX clients, or applications that utilize libraries such as java.io to access the filesystem. Hadoop commands and hdfs APIs can be used to add or update files on the MapR-FS. For example, you can use the hadoop distcp command to copy data from HDFS to MapR-FS. Hadoop Ecosystem components, such as Apache Flume, can also be used to push log files to MapR-FS.

You can also write, update, or delete batch data to MapR-DB tables. Applications can use the OJAI API to write to JSON tables or the HBase API to write to binary tables.

Write Streaming Data to the Platform

Write streaming event data as messages in MapR Stream topics using Kafka API or a REST client application. C, Java, or Python applications can produce messages to one or more topics in a MapR Stream. Additionally, applications written in any language can use the REST Proxy to produce messages to one or more topics in a MapR Stream. For example, a financial service application, written in Java, could produce messages about stock market activity to a MapR Stream topic.