Data Compaction and Recall Criteria

The topic describes the criteria for MAST gateway to decide whether compaction is to be performed for a container (data container or namespace container).

Containers are of two types:

  • Namespace containers
  • Data Containers
Containers can be of two sizes:
  • Large containers: Containers can be termed as large containers when the number of inodes in the container is greater than the value of the configuration variable, mastgateway.offload.opt.largenuminodes.
  • Non-large containers: Containers can be termed as non-large containers when the number of inodes in the container is less than the value of the configuration variable, mastgateway.offload.opt.largenuminodes.

Compaction Criteria for Large Container

Compaction is carried out for large containers (namespace container/data container), where the size of garbage present in the container is greater than the garbage threshold. The garbage threshold is the value set for the configuration variable, mastgateway.ctc.opt.largenuminodes.threshmb (default value is 2 GB).

Compaction is skipped for large containers, where the garbage in the container is less than the garbage threshold.

Recall Expiry Criteria for Large Containers

If data has been recalled from a tier into a Data Fabric cluster, and the size of recalled data is greater than configured value for mastgateway.recallexp.opt.largenuminodes.minpurgemb, the compactor purges the qualified recalled data from the container.

If data has been recalled, and the size of recalled data is less than the configured value for mastgateway.recallexp.opt.largenuminodes.minpurgemb recall expiry is skipped and recalled data is retained on the container of the tiered volume.

Skip Compaction for Large Containers with Garbage Size Greater than Garbage Threshold

You might want to skip the scheduled compaction for a very large container, and run the compaction manually, at a convenient time.

For this purpose, set the configuration variable, mastgateway.ctc.opt.largenuminodes.skipqualifiedctrs.enabled (default value is 0), to true. For details on this configuration variable, refer to config.

When mastgateway.ctc.opt.largenuminodes.skipqualifiedctrs.enabled is set to 1, large containers qualifying the threshold skip the compaction. CLDB raises the alarm, VOLUME_ALARM_COMPACTION_SKIPPED_LARGE_CONTAINER, when the compaction is skipped for a large namespace container qualifying the threshold.

When compaction is skipped in such a case, compaction can be forced to run on such qualified containers by running compaction manually using the maprcli volume compact command. Refer to Compaction Skipped Large Container Volume Alarm for the alarm details.

Compaction Criteria for Non-large Containers

Non-large containers are compacted, by default.

Recall Expiry Criteria for Non-large Containers

If the size of the recalled data in a container (mastgateway.recallexp.opt.largenuminodes.minpurgemb, default value is 2 GB) is greater than configured recall expiry min threshold (mastgateway.recallexp.opt.minpurgemb, default value is 8 MB), recall expiry occurs on the recalled data. The compactor purges the qualified recalled data from the tiered volume.

Refer to config for information about the configuration variables, mastgateway.recallexp.opt.largenuminodes.minpurgemb and mastgateway.recallexp.opt.minpurgemb.