Run Hive Jobs with Oozie
IMPORTANT This component is deprecated. Hewlett Packard
Enterprise recommends using an alternate product. For more information, see Discontinued Ecosystem Components.
You run Hive jobs with Oozie by configuring a Hive workflow.
- Configure a Hive workflow. You can configure Oozie to perform a workflow
by connecting to Hive Metastore or Hiveserver2.Configure a Hive Workflow with Connection to Hive Metastore
- To use a metastore server for the Hive job, add the following
parameter to the hive-site.xml
file:
<property> <name>hive.metastore.uris</name> <value>thrift://<IP address>:<port></value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property>
- Copy the edited
hive-site.xml
file to the same location as yourworkflow.xml
file. - If you are using the Hive-on-Tez engine, and you have changed the
default
tez-site.xml
configuration, perform one of the following steps:- Copy the tez-site.xml file to the same location as your
workflow.xml file:
- Remove the forbidden (forbidden for Oozie) property
from
tez-site.xml
:<property> <name>fs.defaultFS</name> <value>maprfs:///</value> </property>
- Make sure that you update the value for
tez.lib.uris
property after removing thefs.defaultFS
property. For example:tez.lib.uris=maprfs:///apps/tez/tez-<version>,maprfs:///apps/tez/tez-<version>/lib
- Specify the
tez-site.xml
file in thejob-xml
parameter of your workflow.
- Remove the forbidden (forbidden for Oozie) property
from
- Update
<OOZIE_HOME>/conf/action-conf/hive.xml
with the newtez-site.xml
properties.
- Copy the tez-site.xml file to the same location as your
workflow.xml file:
- Edit the
workflow.xml
file to include the following:- Specify the
hive-site.xml
in thejob-xml
parameter. - Specify the name of the script (for example,
script.q
) that contains the hive query in the script parameter. - Optionally, add properties used by the Oozie launcher job.
Add the prefix
oozie.launcher
to the property names.
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"> <start to="hive-node"/> <action name="hive-node"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive"/> <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/> </prepare> <job-xml>hive-site.xml</job-xml> <!- Add this property if you copied tez-site.xml to the same location as your workflow.xml file --> <job-xml>tez-site.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <script>script.q</script> <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param> <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive</param> </hive> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
- Specify the
Configure a Hive Workflow with Connection to HiveServer2- Copy the edited
hive-site.xml
file to the same location as yourworkflow.xml
file. - On a Kerberos secure cluster for Oozie 4.3.0, perform the following
steps:
- Copy the
hive-site.xml
file to the${OOZIE_HOME}/conf/action-conf/
directory. - Rebuild the Oozie war
file:
/opt/mapr/oozie/oozie-<version>/bin/oozie-setup.sh -hadoop <version> /opt/mapr/hadoop/hadoop-<version>
- Copy the
- Edit the
workflow.xml
file to include the following:- Specify the JDBC URL used by Beeline for connections to
Hiveserver2 in the
jdbc-url
element. See Connecting to HiveServer2 for details. - Specify the name of the script (for example,
script.q
) that contains the hive query in the script element.<?xml version="1.0" encoding="UTF-8"?> <workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf"> <start to="hive2-node"/> <action name="hive2-node"> <hive2 xmlns="uri:oozie:hive2-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/output-data/hive2"/> <mkdir path="${nameNode}/user/${wf:user()}/output-data"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <jdbc-url>jdbc:hive2://localhost:10000/default</jdbc-url> <script>script.q</script> <param>INPUT=/user/${wf:user()}/input-data/table</param> <param>OUTPUT=/user/${wf:user()}/output-data/hive2</param> </hive2> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
- Specify the JDBC URL used by Beeline for connections to
Hiveserver2 in the
- To use a metastore server for the Hive job, add the following
parameter to the hive-site.xml
file: