Creating a Volume

Describes volume types, and illustrates how to create volumes using either the Control System, CLI, or the REST API.

About this task

You can create a new (Standard or Mirror) volume using the Control System, the CLI or the REST API.

Creating a Volume Using the Control System

About this task

To create a new (Standard or Mirror) volume using the Control System:

Procedure

  1. Go to Data > Volumes and click Create Volume.
    NOTE The Volumes page is under the Volumes menu in the Kubernetes version of the Control System.
    The Create New Volume page displays.
  2. Select the Volume Type under SETTINGS in the Properties tab. Choose:
    • Standard Volume to create a read-write volume.
    • Mirror Volume to create a volume that is a read-only copy of a source volume.
      TIP See also: Mirror Types.
  3. Specify the following required settings in the Properties tab:
    Volume Name Enter a name for the volume.
    The name should contain only the following characters:
    A-Z a-z 0-9 _ - .
    NOTE
    • The volume name should not begin with mapr. because mapr. is used for system volumes. If you use mapr. at the start of the volume name, the volume may not display in the default view of the list of volumes on the Control System; you must select the Include System Volumes checkbox in the Volumes pane to view volumes with names beginning with mapr.
    • For tiering-enabled volumes, volume name should not exceed ninety-eight characters.
    Accountable Entity Select the type of accountable entity (AE) and enter the name of the entity in the associated text field.
    Steps 4 to 8 are optional and allow you to define optional volume properties (for auditing, replication, and data tiering), volume access, and volume quota. If these are not defined, default values, where available, are used. You can skip to:
    • (Optional) Step 9 to associate a snapshot schedule and/or an offload schedule with the volume.
    • Step 10 to create the volume with basic settings.
    Volume Name Enter a name for the volume.
    The name should contain only the following characters:
    A-Z a-z 0-9 _ - .
    NOTE Do not begin the volume name with mapr. as mapr. is used for system volumes. If you use mapr. at the start of the volume name, the volume may not display in the default view of the list of volumes on the Control System; you must select the Include System Volumes checkbox in the Volumes pane to view volumes with names beginning with mapr..
    Source Volume Cluster Enter the name of the cluster on which the source volume exists.
    The name should contain only the following characters:
    A-Z a-z 0-9 _ - .
    Mirroring only works between two secure clusters or between two non-secure clusters. Mirroring does not work when one cluster is secure and the other is non-secure.
    NOTE When setting up mirror volumes for mirroring between clusters, for the mirroring operation to run successfully, servers in one cluster cannot use the same IP addresses as servers in the other cluster. For example, if node A in cluster A has a private IP address of 10.10.20.29, no server in cluster B can have the same private IP address. Also, all the servers in destination cluster must be able to reach all the servers in the source cluster and vice versa. For example, suppose 10.10.20.29 is the only IP address used by node A in cluster A; then all servers in cluster B should be able to reach the IP address 10.10.20.29.
    Source Volume Name Enter the name of the source volume, from which the mirror volume pulls data, (after selecting the source volume cluster).
    The name should contain only the following characters:
    A-Z a-z 0-9 _ - .
    If the source volume is on:
    • The same cluster, you must create a local mirror volume, which is useful for load balancing or for providing a read-only copy of a data set. See Local Mirroring for more information.
    • Another cluster, you must create a remote mirror volume, which is useful for offsite backup, for data transfer to remote facilities, and for load and latency balancing. See Creating Remote Mirrors for more information.
    NOTE If you plan to enable tiering for the mirror volume, ensure that the selected source volume is also tiering-enabled. You cannot create tiering-enabled mirror volumes to mirror data in standard volumes that are not tiering-enabled.
    For information on setting up mirror cascades, see Mirror Cascades.
    Accountable Entity Select the type of accountable entity (AE) and enter the name of the entity in the associated text field.

    Steps 4 to 8 are optional and allow you to define optional volume properties (for auditing and replication), volume access, and volume quota. If these are not defined, default values, where available, are used. You can skip to:

    • (Optional) Step 9 to associate a mirroring schedule, a snapshot schedule, and/or an offload schedule with the mirror volume.

      The mirroring schedule specifies how often to run the mirroring operation, which copies data (that has changed at the file-block level since the last data transfer) directly from the source, even if the files in the source volume are being written to or deleted.

      The snapshot schedule specifies how often to take a snapshot of the mirror volume for the purpose of preserving the state of the mirror before a subsequent mirroring operation.

      The offload schedule specified how often to offload data in the volume to the configured tier based on the criteria in the storage policy.

    • Step 10 to create the mirror volume with basic settings.
  4. (Optional) Specify the following general and auditing settings under Properties:
    Mount Specifies whether to automatically mount (Yes) or not mount (No) the volume after creation. By default, volumes are mounted immediately after creation. If this is set to Yes, you must also specify the mount path.
    Mount Path The path to mount the volume. This is required if the value for Mount is Yes.
    NOTE The path must be relative to / and cannot be in the form of a global namespace path (for example, /mapr/<cluster-name>/).
    Data Tier Specifies whether (Yes) or not (No) to enable data tiering for volume.
    Collect Metrics Specifies whether (Yes) or not (No) to enable metrics collection for this volume. For more information, see Collecting Volume Metrics and Enabling Volume Metric Collection.
    Volume Access Specifies whether the volume is read-only or read/write volume. Data on a read-only volume can only be read and data can both be read from and written to a read/write volume. By default, a standard volume is created with read/write access. A mirror volume can only be a read-only volume.
    Enable Auditing Select one of the following:
    • Disable auditing
    • Enable auditing for volume so that auditing can selectively be enabled for directories, files, tables, and streams in the volume
    • Enable auditing of operations on all files, tables, and streams in the volume whether or not auditing is enabled for files, tables, and streams in the volume.
    Coalesce Interval The interval of time (in minutes) to use when logging multiple READ, WRITE, or GETATTR operations on one file from one client IP address, if auditing is enabled. The default value is 60 minutes.
    Data on Wire Encryption Specifies whether (Yes) or not (No) to encrypt data in the volume during transmission. By default, the value is Yes for all new volumes on a secure cluster. This parameter is not supported on unsecure clusters.
    Data at Rest Encryption Specifies whether (Yes) or not (No) to encrypt data-at-rest. This is not supported on unsecure clusters.
  5. (Optional) Specify the following under Properties for volume (hot) data replication.
    Topology The rack path to the volume. The default topology is /data.
    Optimize Replication for The basis for the replication factor:
    • High throughput, or cascading replication, where volumes are replicated sequentially on intermediate and tail containers.
    • Low latency, or star replication, where volumes are replicated on multiple containers in parallel.
    The default value is high throughput. See Selecting a Replication Type for High Availability.
    Replication The minimum (Minimum Replication) and desired (Target Replication) number of copies for the volume data. The default minimum is 2, and the default target is 3.
    Name Container Replication The (Minimum Replication) and desired (Target Replication) number of copies for the name container associated with the volume. The default minimum is 2, and the default target is 3.
    Guarantee Min Replication Specifies whether (Yes) or not (No) to enforce minimum number of copies. If this is enabled (Yes), writes succeed only when the minimum number of copies exist. If this is enabled (Yes) and minimum number of copies are not available, the client is asked to retry.

    For more information, see Understanding Replication.

  6. (Optional) If data tier is enabled (Yes), select one of the following, and set values for associated properties.
    For offloading data to an erasure coded volume, specify values for the following properties. If values are not specified, default values are applied.
    Topology The topology of the erasure coded volume from the drop-down list.
    Storage Policy The rule for offloading data in this volume. You can click:
    • Browse to select an existing rule.
    • Create to create a new rule for offloading data. See steps 3 - 5 in the Creating a Storage Tier Policy topic for more information.
    If you do not select a storage policy, the default policy named default.ectier.rule, which is all files (p), is associated with the volume.
    Erasure Coding The erasure coding scheme, which is the number of data chunks and number of parity chunks. You can select one of the following from the Scheme drop-down menu:
    • 3,2 — for 3 data chunks and 2 parity chunks. You must have 5 or more nodes on the cluster for this option. If selected, this has 60% storage overhead and can tolerate failure of up to 2 nodes.
    • 4,2 — for 4 data chunks and 2 parity chunks. You must have 6 or more nodes on the cluster for this option. If selected, this has 50% storage overhead and can tolerate failure of up to 2 nodes.
    • 5,2 — for 5 data chunks and 2 parity chunks. You must have 7 or more nodes on the cluster for this option. If selected, this has 40% storage overhead and can tolerate failure of up to 2 nodes.
    • 6,3 — for 6 data chunks and 3 parity chunks. You must have 9 or more nodes on the cluster for this option. If selected, this has 50% storage overhead and can tolerate failure of up to 3 nodes.
    NOTE Although you can create a volume even if the required number of nodes are not present, offload operation fails if the required number of nodes are not present.
    See Erasure Coding Scheme for Data Protection and Recovery for more information.
    For offloading data to a low cost storage alternative on the cloud, specify values for the following properties.
    Remote Target The location to offload data to. You can click:
    • Browse to select an existing tier.
    • Create to create a new tier. See Creating a Storage Tier for more information.
    Storage Policy The rule for offloading data in this volume. You can click:
    Retention Duration after Recall The number of days to retain data recalled from the tier to the MapR cluster. Once the number of days is reached, recalled data on the MapR cluster is purged (if there are no changes), or offloaded (if there are changes).
    Tier Encryption Specifies whether (Yes) or not (No) to enable encryption of data on the tier. This cannot be modified once it is set. The default value is No (disabled).
  7. (Optional) Specify the following volume (administrative and user) access control settings in the Authorization tab:
    ADMINISTRATIVE CONTROLS The users and/or groups that have one or more of the following permissions:
    Dump & Backup
    Transport large amount of data or copies of the volume on physical media to a remote cluster using backup files.
    Restore & Mirror
    Restore a volume from a dump file and create mirror volumes, which is a read-only copy of the source volume.
    Edit
    Edit volume properties, create and delete snapshots.
    Delete
    Delete the volume.
    Admin
    View and edit access control settings (but cannot perform volume operations).
    Full Control
    Perform all volume-related administrative operations except changing access control settings.
    To define administrative access control settings for another user or group, click Add Another.
    NOTE To perform this action from the command line, refer to acl set.
    By default, the root user and the volume creator have full control permissions on the volume.
    Root Directory Permission
    Set read, write, and/or execute permissions on the root directory for users, groups and others.
    USER ACCESS CONTROLS The users, groups, and/or roles that have and/or do not have access to read and/or write to the volume.
    To grant or block access to users, groups, and/or roles, from the:
    • Basic settings, select the type — public, (OR) user, group, or role — from the drop-down menu, specify the name of the user, group, or role, and select one or more checkbox to grant permissions.
      TIP Click to create a copy of the associated access control setting. Click to remove the associated access control expression.
      To add Access Control Expression (ACE)s for another user, group, or role, click Add Another and repeat this step.
    • Advanced settings, specify public (p) or user (u), group (g), and/or role (r) who have or do not have the type of access using the following boolean expressions and subexpressions:
      • ! — Negation operator.
      • & — AND operation.
      • | — OR operation.
      Use (), parentheses, for subexpressions.
      NOTE You cannot specify user, group, or role individually if access is granted to all users (public).

      Alternatively, click associated with the type of access to use the Access Control Expression window to define access for public or users, group, and/or role. See Defining ACEs for more information.

    NOTE If you switch from Basic to Advanced, the basic settings, if any, are carried over to the advanced settings. If you switch from Advanced to Basic, all the settings are lost because the subexpressions and AND (&) and negation (!) operations that are supported by advanced settings are not supported in the basic settings.
    By default, access is granted to all users (public). If access is granted to all users (public), access cannot also be granted individually to users, groups, and/or roles.
  8. Define the amount of disk space (optional) in megabytes (MB), gigabytes (GB), or terabytes (TB) to allocate to the volume in the Usage tab:
    By default, there is no limit on the disk space.
    WARNING If the quota set for the mirror volume is less than the quota set for its source volume, the CLDB raises an alarm. The mirroring operation does not fail, but you must decide whether to add space and increase the mirror volume quota, or remove unwanted space from the source volume and decrease its quota.
    Advisory Quota An alarm is raised when this limit is reached.
    Hard Quota Writes are not allowed when this limit is reached.

    When you set a hard quota for a tiering-enabled volume, the quota is the total space allocated for the volume irrespective of the location (cluster or tier) where the volume data is stored. For example, if you allocate 1GB of hard quota for a tiering-enabled volume, writes fail after you write 1GB of data whether or not the volume data is local (on the cluster), or offloaded (to the tier).

    NOTE The value for advisory quota cannot be greater than the value for hard quota.
    The graph shows the currently used (across volumes), reserve limit (which is 90% of the total disk space on the cluster), and total amount of disk space at the cluster level.
  9. (Optional) From the Schedule tab, select the schedule(s) to associate with the volume.
    You can click Browse to select an existing schedule, or click Create to create a schedule, for the following:
    Snapshot Schedule The snapshot schedule specifies when snapshots should be automatically created.

    If you do not define this schedule, snapshots are not created for this volume.

    Each snapshot created by a schedule has an expiration date that determines how long the snapshot is retained.
    Tier Offload The tier offload schedule specifies the frequency for automatically offloading data in the volume to the configured tier. This is only available for tiering enabled volumes.

    You can click Browse to select an existing schedule, or click Create to create a schedule for the following:

    Snapshot Schedule The snapshot schedule specifies how often to take a snapshot of the mirror volume to preserve the state of the mirror before a subsequent mirror operation. If corrupt data is copied from the source volume's snapshot into the mirror volume, you can roll back the mirror contents to the snapshot. Each snapshot created by a schedule has an expiration date that determines how long the snapshot is retained.
    Mirror Schedule The mirror schedule specifies how frequently the mirror volume must be synchronized with the source volume. If a disaster (or any type of data loss on a read-write source volume) occurs, the data can be recovered from the mirror volume, but any data written to the source volume since the last successful mirror operation is not on the mirror volume. Therefore, you should set the mirror schedule such that it meets your Recovery Point Objective (RPO).
    TIP For best performance, select a mirroring schedule according to the anticipated rate of data changes and the available bandwidth for mirroring. See also: Guidelines for Setting Mirror Schedules.
    Tier Offload The tier offload schedule specifies the frequency for automatically offloading data in the volume to the configured tier. This schedule is available only for tiering enabled volumes.
    When selecting a schedule, you can choose your custom schedule if it exists, or one of the (following) defaults:
    • Critical data
    • Important data
    • Normal data
    • Automatic Tiering Scheduler (available only for tiering-enabled volumes)
  10. Click Create Volume to create the volume.

What to do next

Creating a Volume Using the CLI or the REST API

About this task

The basic command to create a volume is:

/opt/mapr/bin/maprcli volume create -name <name>

You can create a standard read-write volume, a (local or remote) mirror volume, and/or a tiering enabled standard read-write, and/or a (local or remote) mirror volume.

The name should contain only the following characters:
A-Z a-z 0-9 _ - .

The basic command to create a standard volume is:

/opt/mapr/bin/maprcli volume create -name <volume name> -path <volume path> -type rw

When creating and mounting a volume, the location of the mount point can be specified using the path parameter. The volume that is last in the path parameter is referred to as the parent volume. (The parent volume is the volume on which the volume link is created.)

For complete reference information, see volume create.

  1. Log in to the node on the cluster where you want to create the mirror.
  2. Use the volume create command to create the mirror volume.

    Specify the source volume name, provide a name for the mirror volume, and specify the type as mirror. Optionally, associate a schedule with the volume. For example:

    /opt/mapr/bin/maprcli volume create -name volume-a -source volume-a -type mirror -schedule 2
    For complete reference information, see volume create.
  1. Log in to the node on the cluster where you wish to create the mirror.
  2. Use the volume create command to create the mirror volume.

    Specify the source volume and cluster in the format <volume>@<cluster>, provide a name for the mirror volume, and specify the type as mirror. Optionally, associate a schedule with the volume. For example:

    /opt/mapr/bin/maprcli volume create -name volume-a -source volume-a@cluster-1 -type mirror -schedule 2

    When creating and mounting a volume, the location of the mount point can be specified using the path parameter. The volume that is last in the path parameter is referred to as the parent volume. (The parent volume is the volume on which the volume link is created.)

    For complete reference information, see volume create.
  3. Ensure that each cluster can resolve the nodes in the other cluster.
    See:
The basic command to create a tiering-enabled volume is the following:
/opt/mapr/bin/maprcli volume create -name <volName> -path <volmountpath> -tieringenable true 

For the complete list of supported parameters, see volume create.