Hive Integration

This topic describes how to integrate a Hive database with Kafka Connect for HPE Ezmeral Data Fabric Streams.

Kafka Connect for HPE Ezmeral Data Fabric Streams supports Hive integration. If a Hive database is enabled, an external Hive table is created and that can be queried via Hive shell.

NOTE As of Kafka Connect 4.0.0 for HPE Ezmeral Data Fabric Streams Hive 2.1 is supported.

The Hive table name is constructed using a topic name in the following manner:

  • In the HPE Ezmeral Data Fabric Streams topic, /stream_path:topic-name, the first forward slash (/) is removed, all other slashes are translated to underscores ( _ ), and the colon (:) is translated to an underscore (_).
  • All non-alphanumeric and non-underscore characters are removed from the string representing the Hive table name.

Renaming Topics for Hive usage

The following example shows a topic named /test-12:test1 is renamed for Hive usage.

$ hadoop fs -ls -R /topics
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/+tmp
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/+tmp/test12_test1
        drwxr-xr-x   - mapr mapr          0 2016-10-05 19:50 /topics/+tmp/test12_test1/partition=1
        drwxr-xr-x   - mapr mapr          1 2016-10-05 19:46 /topics/test12_test1
        drwxr-xr-x   - mapr mapr          2 2016-10-05 19:50 /topics/test12_test1/partition=1
        -rwxr-xr-x   3 mapr mapr        241 2016-10-05 19:47 /topics/test12_test1/partition=1/test12_test1+1+0000000078+0000000080.avro
        -rwxr-xr-x   3 mapr mapr        241 2016-10-05 19:50 /topics/test12_test1/partition=1/test12_test1+1+0000000081+0000000083.avro
      

The following query and results shows the topic data in the Hive table.

> select * from test12_test1;
        OK
        16/10/05 20:06:59 INFO mapred.FileInputFormat: Total input paths to process : 2
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        18  data10  1
        Time taken: 0.128 seconds, Fetched: 6 row(s)
>

Streaming Data from HPE Ezmeral Data Fabric Streams to the Hive database

This example provides sample code for streaming data from HPE Ezmeral Data Fabric Streams to the Hive database.

POST /connectors HTTP/1.1
        Host: connect.example.com
        Content-Type: application/json
        Accept: application/json
        
        {
        "name":"hdfs-connector-hive",
          "config":{  
          "hive.integration":"true",
          "hive.database":"db3",
            "hive.conf.dir":"/opt/mapr/hive/hive-1.2/conf",
          "hive.metastore.uris":"thrift://localhost:9083",
          "schema.compatibility":"BACKWARD",
          "connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector",
          "tasks.max":"1",
          "topics":"/kafka-connect:topic3",
          "hdfs.url":"maprfs:///",
          "flush.size":"1"
          }
        }