Loading Data into Binary Tables
A bulkload can be performed as a full bulkload or as an incremental bulkload.
The most common way of loading data into a MapR-DB Binary Tables is with a put operation. However, at large scales, bulk loads offer a performance advantage over put operations.
Bulk loading is supported for the following tools, which can be used for both full and incremental bulk load operations:
-
MapR's hbase CopyTable utility which copies MapR-DB binary table data to another MapR-DB binary table. This utility is different from Apache HBase's CopyTable utility. When copying data from MapR-DB binary table to MapR-DB binary tables, it is recommended to use the MapR-DB version, which copies table metadata, access control expressions, and more in addition to table data.
hbase com.mapr.fs.hbase.tools.mapreduce.CopyTable
- Apache HBase's
CopyTable
utility which can be used to bulkload HBase binary data into MapR-DB binary tables. -
MapR's hbase
ImportFiles
utility which imports HFile or Result files into MapR-DB binary tables. For example:hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles -Dmapred.reduce.tasks=2 -inputDir < input directory, for example: /test/tabler.kv > -table < table name, for example: /table2 > [ -format < Result|HFile > ] [ -sample < true|false > ] [ -mapOnly < true|false > ]
Full Bulk Loads
Full bulk loads offer the best performance advantage for empty binary tables. A full bulk load operation can only be performed to an empty table and skips the write-ahead log (WAL) typical of Apache HBase and MapR-DB binary table operations, resulting in increased performance.
Tables are unavailable for normal client operations, including put, get, and scan operations, while a full bulk load operation is in progress.
Incremental Bulk Loads
Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.
You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.
maprcli table create
command,
with the hbase shell’s create
command, or in MCS, incremental loads
are supported by default.