Kafka Connect 2.0.1: Hive Integration
This topic describes how to integrate a Hive database with Kafka Connect for MapR-ES.
Kafka Connect for MapR-ES supports Hive integration. If a Hive database is enabled, an external Hive table is created and that can be queried via Hive shell.
To implement Hive integration:
On all Kafka Connect nodes, execute the following to create symlinks of the required Hive jars of the correct version:
sudo /opt/mapr/kafka-connect-hdfs/kafka-connect-hdfs-*/bin/configure.sh
The Hive table name is constructed using a topic name in the following manner:
- In the MapR-ES topic, /stream_path:topic-name, the first forward slash (/) is removed, all other slashes are translated to underscores ( _ ), and the colon (:) is translated to an underscore (_).
- All non-alphanumeric and non-underscore characters are removed from the string representing the Hive table name.
Example
The following example shows a topic named /test-12:test1 is renamed for Hive usage.
$ hadoop fs -ls -R /topics
drwxr-xr-x - mapr mapr 1 2016-10-05 19:46 /topics/+tmp
drwxr-xr-x - mapr mapr 1 2016-10-05 19:46 /topics/+tmp/test12_test1
drwxr-xr-x - mapr mapr 0 2016-10-05 19:50 /topics/+tmp/test12_test1/partition=1
drwxr-xr-x - mapr mapr 1 2016-10-05 19:46 /topics/test12_test1
drwxr-xr-x - mapr mapr 2 2016-10-05 19:50 /topics/test12_test1/partition=1
-rwxr-xr-x 3 mapr mapr 241 2016-10-05 19:47 /topics/test12_test1/partition=1/test12_test1+1+0000000078+0000000080.avro
-rwxr-xr-x 3 mapr mapr 241 2016-10-05 19:50 /topics/test12_test1/partition=1/test12_test1+1+0000000081+0000000083.avro
The following query and results shows the topic data in the Hive table.
> select * from test12_test1;
OK
16/10/05 20:06:59 INFO mapred.FileInputFormat: Total input paths to process : 2
18 data10 1
18 data10 1
18 data10 1
18 data10 1
18 data10 1
18 data10 1
Time taken: 0.128 seconds, Fetched: 6 row(s)
>