Configuring the Kafka Storage Plugin

The Kafka storage plugin is not officially supported for Drill; however, if you choose to configure Kafka as a data source in Drill, you must update the <drill_home>/jars/3rdParty directory such that it contains the required JAR files and then restart Drill before you configure the kafka storage plugin in the Drill Web UI.

Verify that the nodes in your cluster meet the requirements and then complete the steps listed.

Requirements

The Kafka storage plugin requires:
  • HPE Ezmeral Data Fabric 6.2 cluster
  • Drill 1.16.1 installed on nodes
  • The HPE Ezmeral Data Fabric Kafka client package (kafka-2.1.1 or 2.6.1) installed on at least one node. The Kafka client installation provides the following kafka JAR files that you copy into the <drill_home>/jars/3rdParty directory (step 4):
    • Kafka-2.1.1
      • kafka_2.11-2.1.1.200-mapr-710.jar
      • kafka-clients-2.1.1.200-mapr-710.jar
    • Kafka-2.6.1 (if you have eep-800 or later installed)
      • kafka_2.13-2.6.1.0-eep-800.jar
      • kafka-clients-2.6.1.0-eep-800.jar
      • kafka-eventstreams-0.1.0.0-eep-800.jar

Steps

Complete the following steps to query Kafka Streams from Drill:
NOTE Do not perform step 2 if you installed Drill using the RPM or Debian packages. Step 2 is only required if you installed Drill using a TAR file.
  1. Remove the specified JAR files from the <drill_home>/jars/3rdParty directory based on the Drill installation method:
    • If you installed Drill using RPM or Debian packages, only remove JAR files that start with kafka, such as kafka-clients-<version>.jar and kafka_<version>.jar, from the <drill_home>/jars/3rdParty directory.
    • If you installed Drill using a TAR file, remove all the JAR files that start with mapr and kafka, such as maprdb-<version>-mapr.jar, maprfs-<version>-mapr.jar, kafka_<version>-mapr.jar, and kafka-clients-<version>.jar, from the <drill_home>/jars/3rdParty directory.
  2. (Only perform this step if you installed Drill using a TAR file.) Copy the following JAR files from the /opt/mapr/lib directory into <drill_home>/jars/3rdParty directory:
    • maprdb-6.2.0.0-mapr.jar
    • maprdb-6.2.0.0-mapr-tests.jar
    • maprfs-6.2.0.0-mapr.jar
    • maprfs-6.2.0.0-mapr-tests.jar
    • mapr-hbase-6.2.0.0-mapr.jar
    • mapr-hbase-6.2.0.0-mapr-tests.jar
    • mapr-streams-6.2.0.0-mapr.jar
  3. Copy the mapr-streams-6.2.0.0-mapr.jar file from the /opt/mapr/lib directory into the <drill_home>/jars/3rdParty directory.
  4. Copy the following kafka JAR files from the /opt/mapr/kafka/kafka-*/libs directory into the <drill_home>/jars/3rdParty directory:
    • Kafka-2.1.1
      • kafka_2.11-2.1.1.200-mapr-710.jar
      • kafka-clients-2.1.1.200-mapr-710.jar
    • Kafka-2.6.1 (if you have eep-800 or later installed)
      • kafka_2.13-2.6.1.0-eep-800.jar
      • kafka-clients-2.6.1.0-eep-800.jar
      • kafka-eventstreams-0.1.0.0-eep-800.jar
  5. Issue the following command to restart Drill:
    $ maprcli node services -name drill-bits -action restart -nodes <node hostnames separated by a space>
  6. Log in to the Drill Web UI, and configure the kafka storage plugin. See Kafka Storage Plugin for instructions.
    NOTE When configuring the kafka storage plugin, you must also include the following parameter in the storage plugin configuration:
    "streams.consumer.default.stream": "<path-to-stream>"

Usage Example

This example shows a Drill query on a MaR Streams data set made accessible to Drill through the kafka storage plugin.

For this example, tables that contain Yelp stream topics reside in a directory named /YelpStream. The kakfa storage plugin is configured with the "streams.consumer.default.stream" parameter pointing to the /YelpStream directory, as shown:
"streams.consumer.default.stream": "/YelpStream" 
The USE command tells Drill to access data from only the kafka data source:
use kafka;
+-----+----------------------------------+
| ok | summary |
+-----+----------------------------------+
| true | Default schema changed to [kafka] |
+-----+----------------------------------+
The SHOW TABLES command lists the tables in the /YelpStream directory configured for the kafka data source:
show tables;
+-------------+---------------------------+
| TABLE_SCHEMA | TABLE_NAME |
+-------------+---------------------------+
| kafka | /YelpStream:UserTable |
| kafka | /YelpStream:ReviewTable |
| kafka | /YelpStream:BusinessTable |
+-------------+---------------------------+
The query selects all the data from the BusinessTable in the /YelpStream directory, limiting the results to one row data:
select * from `/YelpStream:BusinessTable` limit 1;
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type | kafkaTopic | kafkaPartitionId | kafkaMsgOffset | kafkaMsgTimestamp | kafkaMsgKey |
+---+----------+-----------+----------+----+------------+-----+--------+---------+----+-------------+----+------------+-----+-----+----+----------+----------------+--------------+-----------------+-----------+
| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":"true","Parking":{"garage":"false","lot":"true","street":"false","valet":"false","validated":"false"},"Price Range":"1","Ambience":{},"Good For":{},"Music":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur Blvd
Westside
Las Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Wes