Loading Data into Binary Tables

A bulkload can be performed as a full bulkload or as an incremental bulkload.

The most common way of loading data into a MapR-DB Binary Tables is with a put operation. However, at large scales, bulk loads offer a performance advantage over put operations.

Bulk loading is supported for the following tools, which can be used for both full and incremental bulk load operations:

  • MapR's hbase CopyTable utility which copies MapR-DB binary table data to another MapR-DB binary table. This utility is different from Apache HBase's CopyTable utility. When copying data from MapR-DB binary table to MapR-DB binary tables, it is recommended to use the MapR-DB version, which copies table metadata, access control expressions, and more in addition to table data.
    hbase com.mapr.fs.hbase.tools.mapreduce.CopyTable
  • Apache HBase's CopyTable utility which can be used to bulkload HBase binary data into MapR-DB binary tables.
  • MapR's hbase ImportFiles utility which imports HFile or Result files into MapR-DB binary tables. For example:
    hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles
       -Dmapred.reduce.tasks=2 
       -inputDir < input directory, for example: /test/tabler.kv >
       -table < table name, for example: /table2 >
       [ -format < Result|HFile > ]
       [ -sample < true|false > ]
       [ -mapOnly < true|false > ]

Full Bulk Loads

Full bulk loads offer the best performance advantage for empty binary tables. A full bulk load operation can only be performed to an empty table and skips the write-ahead log (WAL) typical of Apache HBase and MapR-DB binary table operations, resulting in increased performance.

NOTE: You can perform a full bulk load only on empty tables that have the bulk load attribute set to true. You can set this value only when creating a table.

Tables are unavailable for normal client operations, including put, get, and scan operations, while a full bulk load operation is in progress.

Incremental Bulk Loads

Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.

NOTE: Tables are available for client operations, such as put, get, and scan operations, during incremental bulk loads.

You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.

NOTE: Whether you create a table with the maprcli table create command, with the hbase shell’s create command, or in MCS, incremental loads are supported by default.