Best Practices for Backing Up MapR Information

Lists the best practices and performance considerations to follow when backing up MapR information.

To back up configuration information and data from your MapR cluster, you must install the appropriate Linux backup client from your backup software provider on your servers in your MapR cluster. Your backup client user must have the proper filesystem, and volume permissions. For details on how to configure MapR volume permissions see Creating Volume-level ACLs and Managing Access Controls.

Backup Configuration Data

By default, all installation files on the cluster, for each server in the cluster, are stored in a single directory on each server in the MapR cluster. To ensure that you backup all the configuration files, MapR supported applications, as well as log files, back up the /opt/mapr directory for all servers in the cluster.

Note that the /opt/mapr location includes all log files. Log files can add a significant amount of data to your backup environment, so evaluate if they are needed for your business continuity requirements. To backup just the configuration files for the cluster, backup the /opt/mapr/conf directory from all servers in the cluster.

Backup Volume Data

MapR's recommended way to backup and restore data, is to enable and configure snapshots and volume mirroring for your data, to another MapR cluster. This step ensures that your business continuity and disaster recovery needs are met.

See the following links for setting up Snapshots, Mirroring, Table and Streams replications.

Snapshots: Managing Snapshots
Mirrors: Mirror Volumes
MapR-DB Table Replication: Managing Table Replication
MapR-ES Streams Replication: Stream Replication

If you do not have a secondary cluster to mirror your data, back up your volumes by specifying the following path in your Linux backup agent: /mapr/cluster_name/ - For example: /mapr/my.cluster.com/.

Performance Considerations When Backing Large Data Sets

You could run into bandwidth and performance limitations when you specify only one path to your MapR cluster, where your data in your volumes is stored on only one Linux host agent. The bottleneck can occur due to the size of that data you are backing up (large file sizes), or due to the number of files you have in your directory structure (millions of files in one directory).

To mitigate performance issues, break up the volumes across multiple Linux backup agents, with specific mount paths. For example:

MapR Linux Host 1 (hostname1):
          /mapr/my.cluster.com/volume1
          /mapr/my.cluster.com/volume2
          /mapr/my.cluster.com/volume3

MapR Linux Host 2 (hostname2):
          /mapr/my.cluster.com/volume4
          /mapr/my.cluster.com/volume5
          /mapr/my.cluster.com/volume6

MapR Linux Host 3 (hostname3):
          /mapr/my.cluster.com/volume7
          /mapr/my.cluster.com/volume8
          /mapr/my.cluster.com/volume9

Preserve Metadata About the Volumes

To preserve metadata such as permissions and Access Control Expression (ACE) rules, run a pre-script process as the mapr user, in your backup agent. For example in your pre-script configuration for your host agent for your cluster, you would run:

maprcli volume dump create
        -name volume1 -dumpfile volume1_fulldump1 -e statefile1

Some backup software may need "stderr" or "stdout" codes to run pre or post processing scripts within their product. In that case, you may need to write a bash script to dump the file to a location of your choice, and ensure that your backup agent is configured to backup that directory. Consult your backup software provider's documentation. For information on creating volume dumps, see Create and Maintain Volume Dump File.