Overview of Tiers

Describes what warm and cold tiers are.

Data fabric considers data that is active and frequently accessed as "hot" data and data that is rarely accessed as “warm” or “cold” data. The mechanism used to store "hot" data is referred to as the hot-tier (or the data fabric cluster), the mechanism used to store "warm" data is referred to as the EC-tier (or low-cost storage alternative on the data fabric cluster), and the mechanism to store "cold" data is referred to as the cold tier (or low-cost storage alternative on the cloud). Hot, warm, and cold data is identified based on the rules and policies set by the administrator.

Data starts off as hot when it is first written to local storage (on the data fabric cluster). It becomes warm or cold based on the rules and policies the administrator configures. Data can then be set up to be automatically offloaded using the data fabric automated storage tiering (MAST) Gateway service to the erasure coded volume on the low-cost storage alternative on the data fabric cluster (warm tier) or to the low-cost storage alternative on the 3rd party cloud object store (cold tier) like AWS S3.

On the data fabric cluster, every volume enabled for erasure coding (or warm tiering) acts as a "front-end" volume and has a corresponding hidden erasure coded (or EC) volume in the specified topology (of the low-cost storage alternative). Erasure coding (EC) is a data protection technique where data is broken into many fragments (or m pieces) and encoded with some extra redundant fragments (or n pieces) to guard against disk failures. That is, for volumes configured for erasure coding, file data in the volume is broken into many fragments (or m pieces) and encoded with pre-configured number of redundant fragments (or n pieces). In the event of disk failure, any m piece can be used to get back the original file. See Erasure Coding Scheme for Data Protection and Recovery for more information.

Although you write to and read from the front-end volumes, the front-end volume is akin to a staging area, where volume’s data is held on demand. Data written to a volume is periodically moved to the back end erasure coded volume, releasing the disk space for the front-end volume on the filesystem and providing the space savings of erasure coded volumes. Data in the front-end volume is moved to the corresponding erasure coded volume based on an offload schedule. The front-end volume holds only small amount of required data, and data is shuffled between the front-end volume and the corresponding erasure coded volume as required. See Data Reads, Writes, and Recalls for more information.

There is also a visible tier-volume on the data fabric cluster for storing the metadata associated with the volume. When you create a warm tier, the tier volume named mapr.internal.tier.<tiername> is by default created in the /var/mapr/tier path. When you create a warm-tier volume using the ecenable parameter or the Control System, a warm tier is automatically created and the corresponding tier volume named mapr.internal.tier.autoec.<volName>.<creationTime> is, by default, created in the /var/mapr/autoectier path.

While three-way replicated regular volumes require 3 times the amount of disk space of the regular volume, erasure coded volumes reduce the storage overhead in the range of 1.2x-1.5x. On the data fabric cluster, only the metadata of the volume in the namespace container is 3-way replicated.

You can create one warm tier per volume using the Control System, the CLI, and REST API or create and associate multiple volumes with different erasure coding schemes with the same warm tier using the CLI and REST API (only). You cannot associate the same warm tier with multiple volumes using the Control System.

On the data fabric cluster, every cold tier (referred to as remote target in the Control System) has a bucket on the 3rd party cloud store where volume data is offloaded based on the policy configured by the administrator. Volume data in 64KB data chunks is packed into 8MB sized objects and offloaded to the bucket on the tier and the corresponding volume metadata is stored in a visible tier-volume as HPE Ezmeral Data Fabric Database tables on the data fabric cluster. During writes and reads, volume data is recalled to the data fabric cluster if necessary. Data written to the volume is periodically moved to the remote target, releasing the disk space on the filesystem. See Data Reads, Writes, and Recalls for more information.

Data stored on the data fabric cluster requires 3 times the amount of disk space of the regular volume on premium hardware due to replication (default being 3). After offloading to the cloud, the space used by data (including data in the namespace container) in the volume on the data fabric cluster is freed and only the metadata of the volume in the namespace container is 3-way replicated on the data fabric cluster.

There is also a visible tier-volume on the data fabric cluster for storing the metadata associated with the volume. When you create a cold tier, the tier volume named mapr.internal.tier.<tierName> is by default created in the /var/mapr/tier path. A directory/folder for the volumes associated with the tier, identifiable by volumeid, is created under the path after the first offload of data from the volume to the tier.

You can create one tier per volume or create and associate multiple volumes with the same tier using the Control System, the CLI, and REST API.