Preparing to Upgrade from Hive 2.x to 3.x

Upgrading from Hive 2.x to 3.x requires you to understand data migration, ACID table migration, permissions, folder structures, and artifact naming.

EEP 9.0.0 introduced Hive 3.1.3, while EEP 7.x and 8.x supported Hive 2.3. Any upgrades from EEP 7.x and 8.x to EEP 9.0.0 require a thorough review of the considerations in this topic. For information about the Hive versions in different EEPs, see Component Versions for Released EEPs.

ACID Table Migration

In Hive 3.x, all data – including data in tables, partitions, and UDF functions – is supported as is in Hive 2.x, except for ACID (transactional) tables. ACID tables require some actions before you upgrade from Hive 2.x to 3.x.

Hive 3.x changed the on-disk layout of ACID tables. Any ACID table partition that had an Update, Delete, or Merge statement executed since the last major compaction must execute a major compaction before upgrading to Hive 3.x.

No more Update, Delete, or Merge statements may be executed against these tables after the start of major compaction. Not following this sequence can lead to data corruption. Tables and partitions that contain only results of Insert statements are fully compatible and do not need to be compacted.

For details, see ACID Table Upgrade Routine.

Permission Processing for New Tables

Hive 3.x dropped the following property:
hive.warehouse.subdir.inherit.perms
Instead of the Hive permission inheritance that was based on the hive.warehouse.subdir.inherit.perms parameter setting, Hive 3.x supports the data-fabric file-system access control model. In Hive 3x, a directory inherits permissions from the Default file-system value. All permissions-inheritance logic has been removed.
To summarize the new behavior:
  • 777 - default warehouse directory
  • 755 - child directories (no more inheritance)

Table permissions that remain from Hive 2.x are unchanged.

Folder Structure and Versioning

Unlike Hive 2.x, Hive-3.x has a three-digit version, which introduces a change in the HIVE_HOME pattern. For example:
Hive Version HIVE_HOME Pattern
2.x /opt/mapr/hive/hive-2.3
3.x /opt/mapr/hive/hive-3.1.3

This change can affect any custom parsing utilities for HIVE_HOME.

Artifact Naming

Hive 3.x JAR artifacts use a four-digit version. For example:
hive-A.B.C.D.jar
where:
  • A is the Major version
  • B is the Minor version
  • C is the Patch version
  • D is the EBF/Release version

This change can affect dependency management in custom applications that refer to Hive 3.x.