HPE Ezmeral Data Fabric Database JSON CopyTable

Copies data from one HPE Ezmeral Data Fabric Database JSON table to another HPE Ezmeral Data Fabric Database JSON table.

If the destination table does not exist, mapr copytable creates the destination table with the same metadata (column families and access control expressions) as the source table, and then copies data.

If the destination table exists, mapr copytable copies data only.

Required Permissions

The user that runs mapr copytable must have the following permissions, which you can grant with access-control expressions:
  • The permission readAce on the volume where the source table is located, and the permission writeAce on the volume where the destination table is or will be located.
  • The permission adminperm on the source table.
  • The permission for column-family and column reads (readperm) on the data in the source table that you want to copy.
  • When bulkload = false, the permission for column writes (writeperm) on the destination table.
  • When bulkload = true (default), the permission to load the destination table with bulk loads (bulkloadperm).
  • If the destination table does not yet exist: createrenamefamily on the source table.

For information about how to set permissions on volumes, see Setting Whole Volume ACEs.

For information about how to set permissions on tables, see Enabling Table and Stream Authorizations with ACEs.

NOTE The mapr user is not treated as a superuser. HPE Ezmeral Data Fabric Database does not allow the mapr user to run this utility unless that user is given the relevant permission or permissions with access-control expressions.

Syntax

mapr copytable 
-src <source table path>
-dst <destination table path>
[-fromID <start key>]
[-toID <end key>]
[-bulkload <true|false> (default: false)]
[-mapreduce <true|false> (default: true)]
[-cmpmeta <true|false> (default: true)]
[-numthreads <number of threads> (default: 16)]
[-maxsplits <integer> (default: 2000)] 

Parameters

Parameter Description
src The path of the table that you want to copy from.
dst The path of the table that you want to copy to.
fromID

The value of the _id field in the first document of the range of documents to copy.

startRow is an alias for this parameter.

toID

The value of the _id field in the last document of the range of documents to copy.

stopRow is an alias for this parameter.

bulkload A Boolean value that specifies whether or not to perform a full bulk load of the table. The default is not to use bulk loading (false). To use bulk load, you must set the -bulkload parameter of the table to true by running the command maprcli table edit -path <path to table> -bulkload true.
mapreduce

A Boolean value that specifies whether or not to use a MapReduce program to perform the copying operation. The default, preferred method is to use a MapReduce program (true).

When this parameter is set to false, a client process uses multiple threads to read rows of the source table and write rows to the destination table.

cmpmeta A Boolean value that specifies whether or not to compare table metadata such as column families and ACEs. The default is to compare metadata (true). Such comparisons are done when the destination table exists before mapr copytable is run and checks that the user ID that runs mapr copytable has the proper permissions on the destination table.

Set the value of this parameter to false before copying a table that contains a single column family to a table that contains two or more column families.

numthreads When -mapreduce is false, this parameter specifies the number of threads allocated to perform the copying of data. The default is 16. If additional CPU resources are available, you might want to increase the number of threads to achieve better performance.
maxsplits Sets the maximum number of destination table presplit tablets. Default is 2000. If copytable fails with an Error NO ENTry message during table creation, the operation could not complete within the timeout (10 minutes). Reduce the value of -maxsplits. This functionality requires a patch. See Applying a Patch.

Example

The following example copies documents starting from ID user000001 to ID user009999:

[user@hostname ~]$ mapr copytable -src /user1/tableA -dst 
/mapr/clusterB/vol1/tableB -fromID user000001 -toID user009999

Monitoring mapr copytable Operations

Use one of the following methods to monitor the progress of the copying of table data:
  • If the copy table operation runs as a MapReduce v2 application, monitor the application using the ResourceManager UI.
  • If the copy table operation runs as a client process, go to the Tables view of the destination table in the MapR Control System. Then, on the Region tab, monitor the pace at which the number of rows increases.