Home
6.0 Development
This section contains information related to application development for ecosystem components and MapR products including MapR-DB (binary and JSON), MapR-FS, and MapR Streams.
Ecosystem Components
The following sections provide information about each open source project that MapR supports.
MapR-ES Clients and Tools
Kafka Connect 2.0.1 for MapR-ES
Kafka Connect for MapR-ES is a utility for streaming data between MapR-ES and Apache Kafka and other storage systems. This release of Kafka Connect is associated with MEP 2.x, 3.x, and 4.x.
Kafka Connect 2.0.1: Hive Integration
This topic describes how to integrate a Hive database with Kafka Connect for MapR-ES.

MapR 6.0 Documentation

6.0 Development
This section contains information related to application development for ecosystem components and MapR products including MapR-DB (binary and JSON), MapR-FS, and MapR Streams.
- Application Development Process
  Before you start developing applications on MapR’s Converged Data Platform, consider how you will get the data onto the platform, the format it will be stored in, the type of processing or modeling that is required, and how the data will be accessed.
- MapR-FS and Apps
  The following sections provide information about accessing MapR-FS with C and Java applications.
- MapR-DB and Apps
  This section contains information about developing client applications for JSON and binary tables.
- MapR-ES and Apps
  MapR-ES brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- MapReduce and Apps
  This section contains information associated with developing YARN applications.
- MapR Data Science Refinery
  The MapR Data Science Refinery is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.
- MapR Data Fabric for Kubernetes FlexVolume Driver
  This section describes how to use and troubleshoot the MapR Data Fabric for Kubernetes FlexVolume Driver.
- Ecosystem Components
  The following sections provide information about each open source project that MapR supports.
  - MapR Ecosystem Packs
    A MapR Ecosystem Pack (MEP) provides a set of ecosystem components that work together on one or more MapR cluster versions. Only one version of each ecosystem component is available in each MEP. For example, only one version of Hive and one version of Spark is supported in a MEP.
  - AsyncHBase
  - Cascading
  - Drill
  - Flume
  - HBase Client and MapR-DB Binary Tables
  - HCatalog
  - Hive
  - HttpFS
  - Hue
  - Impala
  - MapR-ES Clients and Tools
    - Kafka REST Proxy 4.0.0
      The Kafka REST Proxy provides a RESTful interface to MapR-ES clusters to consume and produce messages and to perform administrative operations.
    - Kafka Connect 4.0.0
      Kafka Connect is a utility for streaming data between MapR-ES and other storage systems.
    - Kafka REST Proxy 2.0.1 for MapR-ES
      The Kafka REST Proxy 2.0.1 for MapR-ES provides a RESTful interface to MapR-ES clusters to consume and produce messages and to perform administrative operations. This release of Kafka REST Proxy 2.0.1 is associated with MEP 2.x, 3.x, and 4.x.
    - Kafka Connect 2.0.1 for MapR-ES
      Kafka Connect for MapR-ES is a utility for streaming data between MapR-ES and Apache Kafka and other storage systems. This release of Kafka Connect is associated with MEP 2.x, 3.x, and 4.x.
      - Kafka Connect 2.0.1: Architecture
        Kafka Connect for MapR-ES has the following major models in its design: connector, worker, and data.
      - Kafka Connect 2.0.1: Connectors, Tasks, and Workers
        This section describes how Kafka Connect for MapR-ES work and how connectors, tasks, offsets, and workers are associated wth each other.
      - Kafka Connect 2.0.1: Connector Configuration
        This section describes how and where connectors are configured.
      - Kafka Connect 2.0.1: Worker Configuration
        This section describes how and where to configure workers.
      - Kafka Connect 2.0.1: JDBC Connector
        The topics describes the JDBC connector, drivers, and configuration parameters.
      - Kafka Connect 2.0.1: Managing Kafka Connect Services
        Lists the commands you use to start, stop, or restart Kafka Connect Services
      - Kafka Connect 2.0.1: HDFS Connector
        These topics describe the Kafka Connect for MapR-ES HDFS connector, driver, and configuration parameters.
      - Kafka Connect 2.0.1: REST API
        The Kafka Connect REST API for MapR-ES manages connectors.
      - Kafka Connect 2.0.1: Hive Integration
        This topic describes how to integrate a Hive database with Kafka Connect for MapR-ES.
  - Myriad
  - OpenStack Manila
  - Oozie
  - Pig
  - Sentry
  - Spark
  - Sqoop
  - Third Party Solutions
- Maven and MapR
  This section discusses topics associated with Maven and MapR.
- Developer's Reference
  This section contains in-depth information for the developer.
- API Documentation
  MapR supports public APIs for MapR-FS, MapR-DB, and MapR-ES. These APIs are available for application development purposes.

Kafka Connect 2.0.1: Hive Integration

This topic describes how to integrate a Hive database with Kafka Connect for MapR-ES.

Kafka Connect for MapR-ES supports Hive integration. If a Hive database is enabled, an external Hive table is created and that can be queried via Hive shell.

NOTE: Kafka Connect for MapR-ES supports Hive 1.2.

To implement Hive integration:

On all Kafka Connect nodes, execute the following to create symlinks of the required Hive jars of the correct version:

sudo /opt/mapr/kafka-connect-hdfs/kafka-connect-hdfs-*/bin/configure.sh

The Hive table name is constructed using a topic name in the following manner:

In the MapR-ES topic, /stream_path:topic-name, the first forward slash (/) is removed, all other slashes are translated to underscores ( _ ), and the colon (:) is translated to an underscore (_).
All non-alphanumeric and non-underscore characters are removed from the string representing the Hive table name.

Example

The following example shows a topic named /test-12:test1 is renamed for Hive usage.

$ hadoop fs -ls -R /topics
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/+tmp
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/+tmp/test12_test1
        drwxr-xr-x   - mapr mapr          0 2016-10-05 19:50 /topics/+tmp/test12_test1/partition=1
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/test12_test1
        drwxr-xr-x   - mapr mapr          2 2016-10-05 19:50 /topics/test12_test1/partition=1
        -rwxr-xr-x   3 mapr mapr        241 2016-10-05 19:47 /topics/test12_test1/partition=1/test12_test1+1+0000000078+0000000080.avro
        -rwxr-xr-x   3 mapr mapr        241 2016-10-05 19:50 /topics/test12_test1/partition=1/test12_test1+1+0000000081+0000000083.avro

The following query and results shows the topic data in the Hive table.

> select * from test12_test1;
        OK
        16/10/05 20:06:59 INFO mapred.FileInputFormat: Total input paths to process : 2
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        Time taken: 0.128 seconds, Fetched: 6 row(s)
>