Configuring MSCK REPAIR TABLE

This section guides you through configuring MSCK REPAIR TABLE command to compare and update the partitions in Hive Metastore and file systems.

Use the MSCK REPAIR TABLE command to manually update (ADD, DROP, SYNC) the partitions on Hive metastore with respect to file systems like HDFS, Amazon S3, filesystem, and others.

For example: You specify the location of filesystem when you create a Hive table. When you add or delete the partitions to or from the filesystem, the partitions in filesystem and Hive metastore becomes inconsistent.

Run MSCK REPAIR TABLE command to compare the partitions in filesystem and the partitions in Hive metastore and update the partitions in Hive metastore.
MSCK [REPAIR] TABLE <table name> [ADD/DROP/SYNC PARTITIONS];
Configure the Hive Metastore with the following Hive property:
Property Default Description
hive.msck.repair.batch.max.retries 0 Maximum number of retries for the msck repair command when adding unknown partitions. If the value is greater than zero it will retry adding unknown partitions until the maximum number of attempts is reached or batch size is reduced to 0, whichever is earlier. In each retry attempt, it will reduce the batch size by a factor of 2 until it reaches zero. If the value is set to zero it will retry until the batch size becomes zero as described above.