ACID Table Upgrade Routine

Contains a procedure that must be followed if your installation of Hive 2.x includes ACID tables and you want to upgrade from Hive 2.x to 3.x. If Hive is upgraded from 2.x to 3.x without performing these steps, data in the ACID tables will be corrupted during the upgrade.

Prerequisites

The following steps assume:
  • A cluster with release 7.0.0 and EEP 8.1.0.
  • The cluster is running Hive 2.3 and Hadoop 2.7.
  • Derby is not used as the Hive Metastore backend database.
  • The Hive Upgrade ACID Tool JAR has been downloaded to the Hive 2.x installation node.

Considerations for Running the Tool

Note these considerations:
  • You must run the Upgrade ACID Tool before upgrading any cluster package.
  • You must run the Upgrade ACID Tool on a live cluster.
  • Before running the Upgrade ACID Tool, stop the hs2 service to ensure no access is permitted during the upgrade tool run.

ACID Table Upgrade Steps

Use these steps to run the tool:
  1. Stop the hs2 service:
    $ maprcli node services -action stop -nodes `hostname -f` -name hs2; 
  2. Run the Upgrade ACID Tool. Modify the following paths in the run command to match the environment:
    Path Description
    /opt/mapr/hive/hive-<old_hive_version> Path to the Hive 2.x installation
    /opt/mapr/hadoop/hadoop-<old_hadoop_version> Path to the Hadoop 2.7 installation
    <path_to>/hive-upgrade-acid-<new_hive_version>-eep-900.jar Path to the upgrade tool JAR file
    Here is the command syntax:
    $ java -cp /opt/mapr/lib/*:/opt/mapr/hive/hive-<old_hive_version>/lib/*:/opt/mapr/hive/hive-<old_hive_version>/conf/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/lib/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/etc/hadoop/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/share/hadoop/yarn/sources/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/share/hadoop/mapreduce/sources/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-<old_hadoop_version>/share/hadoop/hdfs/sources/*:/home/mapr/hive-upgrade-acid-<new_hive_version>-eep-900.jar org.apache.hadoop.hive.upgrade.acid.UpgradeTool -preUpgrade -execute 
    If the path values are as follows:
    Path Description
    /opt/mapr/hive/hive-2.3 Path to the Hive 2.x installation
    opt/mapr/hadoop/hadoop-2.7.6 Path to the Hadoop 2.7 installation
    /home/mapr/hive-upgrade-acid-3.1.3.0-eep-900.jar Path to the upgrade tool JAR file
    Here's an example:
    $ java -cp /opt/mapr/lib/*:/opt/mapr/hive/hive-2.3/lib/*:/opt/mapr/hive/hive-2.3/conf/*:/opt/mapr/hadoop/hadoop-2.7.6/lib/*:/opt/mapr/hadoop/hadoop-2.7.6/etc/hadoop/*:/opt/mapr/hadoop/hadoop-2.7.6/share/hadoop/yarn/sources/*:/opt/mapr/hadoop/hadoop-2.7.6/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-2.7.6/share/hadoop/mapreduce/sources/*:/opt/mapr/hadoop/hadoop-2.7.6/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.6/share/hadoop/hdfs/sources/*:/home/mapr/acid-test/hive-upgrade-acid-3.1.3.0-eep-900.jar org.apache.hadoop.hive.upgrade.acid.UpgradeTool -preUpgrade -execute 
    Note that the -preUpgrade and -execute flags are mandatory.
  3. Continue the cluster and Hive upgrade procedures. At this point, the ACID tables are ready to use by Hive 3.x, and no further ACIDupgrade actions are required.

Troubleshooting

This section addresses common troubleshooting scenarios during the ACID table upgrade operation:

Problem
The Hive Upgrade ACID Tool finishes almost instantly with the following log messages (the example log is trimmed for readability):
INFO  [main] acid.UpgradeTool - No compaction is necessary 
INFO  [main] acid.UpgradeTool - No acid conversion is necessary 
INFO  [main] acid.UpgradeTool - No managed table conversion is necessary 
INFO  [main] acid.UpgradeTool - No file renaming is necessary 
Solution
These log messages are not necessarily a problem. It is possible that even though ACID tables are present, the upgrade tool decided these tables do not need any upgrade modifications.
Problem
The Hive Upgrade ACID Tool fails with the following error:
java.lang.NoClassDefFoundError 
Solution
Make sure all paths in the run command are specified correctly and exist in the file system.
Problem
The Hive Upgrade ACID Tool fails with the following error:
Error: Could not find or load main class org.apache.hadoop.hive.upgrade.acid.UpgradeTool 
Solution
Make sure that:
  • The path to the upgrade tool JAR file is specified correctly.
  • The JAR file is included in the classpath option.
  • The JAR file exists within the specified path.
Problem
The Hive Upgrade ACID Tool fails with the following log messages (the example log is trimmed for readability):
ERROR [main] acid.UpgradeTool - UpgradeTool failed 
java.lang.NullPointerException 
at org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java) 
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java)  
Solution
Most likely the run command does not contain the -execute flag. Make sure that the -execute flag contains a preceding dash.
Problem
The Hive Upgrade ACID Tool fails with the following log messages (the example log is trimmed for readability):
WARN  rpcauth.RpcAuthRegistry - No RpcAuthMethod registerd for authentication method CUSTOM 
ERROR acid.UpgradeTool - UpgradeTool failed 
java.lang.NullPointerException 
at org.apache.hadoop.hive.thrift.ThriftTransportHelper.createMapRSaslTransport (ThriftTransportHelper.java) 
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge25Sasl$Client.createClientTransport (HadoopThriftAuthBridge25Sasl.java) 
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open (HiveMetaStoreClient.java) 
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java) 
Solution
Most likely too many JARs were specified in the classpath. Do not use a command such as the following to collect JARs for the classpath in the upgrade utility run command. Use exactly the classpath values specified in the preceding template:
find /opt/mapr -iname "*.jar" | xargs | tr -s ' ' ':'