Configuring the Kafka Storage Plugin
The Kafka storage plugin is not officially supported for Drill; however, if you choose
to configure Kafka as a data source in Drill, you must update the
<drill_home>/jars/3rdParty
directory such that it contains the required
JAR files and then restart Drill before you configure the kafka
storage plugin
in the Drill Web UI.
Verify that the nodes in your cluster meet the requirements and then complete the steps listed.
Requirements
The Kafka storage plugin requires:
- A MapR 6.1 cluster.
- Drill 1.14 installed on nodes.
- The MapR Kafka client package (kafka-1.1.1) installed on at
least one node. The Kafka client installation provides the following kafka JAR files
that you copy into the
<drill_home>/jars/3rdParty
directory (step 4):- kafka_2.11-1.1.1-mapr-1808.jar
- kafka-clients-1.1.1-mapr-1808.jar
Steps
Complete the following steps to query Kafka Streams
from Drill:
NOTE Do not perform step 2 if you installed Drill using the MapR RPM or Debian packages. Step 2 is only required if you installed Drill using
a TAR file.
- Remove the specified JAR files from the
<drill_home>/jars/3rdParty
directory based on the Drill installation method:- If you installed Drill using MapR RPM or Debian
packages, only remove JAR files that start with kafka, such as
kafka-clients-<version>.jar
andkafka_<version>.jar
, from the<drill_home>/jars/3rdParty
directory. - If you installed Drill using a TAR file, remove all the JAR files that start with
mapr
andkafka
, such asmaprdb-<version>-mapr.jar, maprfs-<version>-mapr.jar
,kafka_<version>-mapr.jar
, andkafka-clients-<version>.jar
, from the<drill_home>/jars/3rdParty
directory.
- If you installed Drill using MapR RPM or Debian
packages, only remove JAR files that start with kafka, such as
- (Only perform this step if you installed Drill using a TAR file.) Copy the following
JAR files from the
/opt/mapr/lib directory
into<drill_home>/jars/3rdParty
directory:maprdb-6.1.0-mapr.jar
maprdb-6.1.0-mapr-tests.jar
maprfs-6.1.0-mapr.jar
maprfs-6.1.0-mapr-tests.jar
mapr-hbase-6.1.0-mapr.jar
mapr-hbase-6.1.0-mapr-tests.jar
mapr-streams-6.1.0-mapr.jar
- Copy the
mapr-streams-6.1.0-mapr.jar
file from the/opt/mapr/lib
directory into the<drill_home>/jars/3rdParty
directory. - Copy the following kafka JAR files from the
/opt/mapr/kafka/kafka-1.1.1/libs
directory into the<drill_home>/jars/3rdParty
directory:kafka_2.11-1.1.1-mapr-1808.jar
kafka-clients-1.1.1-mapr-1808.jar
- Issue the following command to restart
Drill:
$ maprcli node services -name drill-bits -action restart -nodes <node hostnames separated by a space>
- Log in to the Drill Web UI, and configure the kafka storage
plugin. See Kafka Storage Plugin for instructions.
NOTE When configuring the kafka storage plugin, you must also include the following parameter in the storage plugin configuration:
"streams.consumer.default.stream": "<path-to-stream>"
Usage Example
This example shows a Drill query on a MaR Streams data set made accessible to Drill through the kafka storage plugin.
For this example, tables that contain Yelp stream topics reside in a directory named
/YelpStream. The kakfa storage plugin is configured with the
"streams.consumer.default.stream" parameter pointing to the /YelpStream directory, as
shown:
"streams.consumer.default.stream": "/YelpStream"
The USE command tells Drill to access data from only the kafka data
source:
use kafka;
+-----+----------------------------------+
| ok | summary |
+-----+----------------------------------+
| true | Default schema changed to [kafka] |
+-----+----------------------------------+
The SHOW TABLES command lists the tables in the /YelpStream directory configured for the
kafka data
source:
show tables;
+-------------+---------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+-------------+---------------------------+
| kafka | /YelpStream:UserTable |
| kafka | /YelpStream:ReviewTable |
| kafka | /YelpStream:BusinessTable |
+-------------+---------------------------+
The query selects all the data from the BusinessTable in the
/YelpStream
directory, limiting the results to one row
data:select * from `/YelpStream:BusinessTable` limit 1;
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type | kafkaTopic | kafkaPartitionId | kafkaMsgOffset | kafkaMsgTimestamp | kafkaMsgKey |
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":"true","Parking":{"garage":"false","lot":"true","street":"false","valet":"false","validated":"false"},"Price Range":"1","Ambience":{},"Good For":{},"Music":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd
Westside
Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Wes