Configuring the Hive Storage Plugin

About this task

You can connect Drill to a Hive data source through the hive storage plugin configuration in the Drill Web UI. After configuration, use Drill to query data stored in Hive.

Drill can work with only one version of Hive in a given cluster. To access Hive tables using custom SerDes or InputFormat/OutputFormat, all nodes running Drill must have the SerDes or InputFormat/OutputFormat JAR files in the <drill_installation_directory>/jars/3rdparty location.

To query across multiple versions of Hive, install each version of Hive on a separate Drill cluster. You must define separate storage plugins, each corresponding to the specific Hive version of the metastore.
NOTE In EEP 6.0, Drill requires Hive version 2.3.3-mapr or later to successfully query Hive data sources.
NOTE You can update the Hive storage plugin configuration through the configuration script, configure.sh. If the Hive storage plugin is disabled, and the configuration in the Drill Web UI displays “null,” you must rerun configure.sh with the -hiveMetastoreHost argument. See configure.sh for details.

Configuring a Hive Remote Metastore

A remote Hive metastore configuration runs as a separate service outside of Hive. The metastore service communicates with the Hive database over JDBC. Point Drill to the Hive metastore service address, and provide the connection parameters in the Hive storage plugin configuration to configure a connection to Drill. The Hive storage plugin (located on the Storage tab in the Drill Web UI) has the following default configuration if you install Drill:
{
 "type": "hive",
 "enabled": true,
 "configProps": {
  "hive.metastore.uris": "", 
  "javax.jdo.option.ConnectionURL": "jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true", 
  "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", 
  "fs.default.name": "file:///", 
  "hive.metastore.sasl.enabled": "false", 
  "datanucleus.schema.autoCreateAll": "true" 
 }
}

Complete the following steps to modify the default Hive storage plugin configuration for your MapR File System environment:

Procedure

  1. Verify that Hive is running.
  2. Issue the following command to start the Hive metastore service on the system specified in the hive.metastore.uris: hive --service metastore
  3. Start the Drill Web UI.
  4. Select the Storage tab. If Web UI security is enabled, you must have administrator privileges to perform this step.
  5. In the list of disabled storage plugins in the Drill Web UI, click Update next to Hive.
  6. Update the following Hive storage plugin parameters to match the system environment:
    • "hive.metstore.uris"
    • "jdbc:<database>://<host:port>/<metastore database>"
    • Change the default location of files to suit your environment. For example, change "fs.default.name": "file:///" to the MapR File System location: maprfs:///
    • To run Drill and Hive in a secure MapR cluster, change the "hive.metastore.sasl.enabled" parameter to "true".
    • Change the "datanucleus.schema.autoCreateAll" property setting for your system environment. After it is enabled, "datanucleus.schema.autoCreateAll" initializes the Hive metastore schema.
      • In a production environment, remove the "datanucleus.schema.autoCreateAll" property from the Hive storage plugin configuration; the property is not required because the preferred schema information is already created for the Hive metastore service.
      • In a test environment with an embedded Hive metastore, you can disable (set to false) this property after the first query on the Hive data source that you submit from Drill. Alternatively, use the Hive schema tool to initialize or upgrade the Hive metastore schema. Using the Hive schema tool is recommended for queries on transactional tables. Run the schematool command as an initialization step:
        /opt/mapr/hive/hive-<version>/bin/schematool -dbType <databaseType> -initSchema
  7. Click Enable in the Web UI to enable the Hive storage plugin configuration.