Enabling Tiering

Describes how to enable data tiering using both the Control System and the CLI.

For a primer on Data Tiering, see Data Tiering.

On all new installations, the data tiering functionality is enabled and available for all new volumes. If you are upgrading, you must enable data tiering; see Step 4: Enable New Features for more information. If the data tiering functionality is enabled, you can then selectively enable tiering for a volume at the time of volume creation using the Control system and the CLI.

Data tiering is only available for new volumes; you cannot enable data tiering for existing volumes. Enable tiering for new volumes where read/write latency is not the dominant concern. You can decide later whether you want to do local (or warm) or remote (or cold) tiering. Data tiering cannot be disabled after it is enabled for a volume.

Enabling Tiering Using the Control System

  1. Move the Data Tier slider to Yes (to enable tiering) in the Create New Volume page in the Control System.

    Proceed to the next step only if you wish to select a tier type for the volume. You can create a tiering-enabled volume without selecting a tier type and select a tier type later by editing the volume.
    Note: You cannot disable tiering for a volume after it is enabled.
  2. (Optional) Select Erasure Coding (for warm tiering) or Remote Archiving (for cold tiering) from the Tiering Type drop-down menu.

    You:
    • Can enable a volume for either warm or cold tiering, but not for both.
    • Cannot modify the tier type after the volume is created.
  3. Specify all other required and optional properties for creating the volume and click Create Volume.

    For information on required and optional properties, see Creating a Volume.

Enabling Tiering Using the CLI

  • Run the following command to enable tiering:

    maprcli volume create -name <vol-name> -path <mount-path> -tieringenable true
    For more information, see volume create.

Introduction to Parallel Offload

The MAST Gateway uses parallel threads to rapidly offload tiering-enabled volumes to either cold or warm tiers.

Prior to data-fabric version 6.2, for any given volume, by default, only one MAST Gateway is used to offload the data. All tiering tasks such as offloads, recall, and compaction are scheduled only on that one assigned gateway. This causes multiple tasks to contend for limited threads on that assigned node, resulting in slowdowns. EC encoding performance is limited because of a single gateway. S3 offload throughput is limited as well. MAST Gateway utilization is skewed as some nodes may be idle, while others may be over utilized. When the assigned gateway goes offline, the volume is assigned a new gateway and the tasks are restarted.

To mitigate these issues with a single gateway, when a new cluster is setup with version 6.2 software, multiple MAST Gateways are used in parallel to offload the data of a single volume.

Parallel Offload uses one primary MAST Gateway and multiple secondary MAST Gateways per volume. The primary gateway coordinates tasks across secondary gateways and reports their final status to CLDB.

Only volume level offload tasks are sharded. File level tasks and all non-offload volume tasks are run on the primary gateway itself.

Advantages of Parallel Offload

Multiple MAST Gateways provide the following advantages:

  • Increased per-volume throughput
  • Leverage idle/unused cluster nodes
  • Increased per-gateway efficiency
  • Efficient usage of network bandwidth
  • Leverage local reads to improve offload efficiency
  • Only a subset of tasks need to be rescheduled if a gateway goes offline
  • Resilient to failures
  • Efficient volume and tasks level load balancing

Resiliency with Parallel Offload

Parallel Offload is resilient to the following scenarios:
Restart of the Primary Gateway
  • Secondary gateways continue to run assigned tasks while the primary gateway is down or restarting.
  • CLDB reassigns the volume to another primary gateway.
  • CLDB restarts the tasks on the new primary gateway.
  • The primary gateway polls/reschedules the ongoing secondary gateway tasks.
Restart of the Secondary Gateway
  • The primary gateway detects the failure of secondary gateway tasks when it polls the secondary gateway.
  • The primary gateway reschedules tasks that were terminated when the secondary gateway restarted.
Restart/Switchover of CLDB
  • Reassign volume to the same primary gateway.
  • Reschedule pending volume task on the same primary gateway.

Load Balancing with Parallel Offload

Load Balancing involves:
Volume Level
  • CLDB assigns each volume to a gateway with the least number of volumes.
  • Gateway Balancer reassigns volumes across gateways.
Task Level
CLDB balances tasks across MFS nodes.

Enabling Parallel Offload on an Upgraded Cluster

When the cluster is upgraded to version 6.2 from a previous release, only one MAST Gateway is used to offload the data of a single volume. To use multiple MAST Gateways, to offload a volume’s data in parallel, you have to enable parallel offloads feature:

maprcli cluster feature enable -name mfs.feature.container.sharding.support
To check whether parallel offloads are enabled, run:
maprcli config load -json | grep -i shard

A value of 0 indicates that parallel offloads are enabled. For example:

"mastgateway.disable.sharding":"0"