Loading Data into Binary Tables

Bulkload operations can be performed as a full bulkload or as an incremental bulkload.

The most common way of loading data into a MapR Database Binary Tables is with a put operation. However, at large scales, bulk loads offer a performance advantage over put operations.

Bulk loading is supported by the following tools, which can be used for both full and incremental bulkload operations:

  • Hbase MapR Database Binary CopyTable utility which copies MapR Database binary table data, table metadata, access control expressions, and more to another MapR Database binary table.
    hbase com.mapr.fs.hbase.tools.mapreduce.CopyTable
  • Hbase ImportFiles utility which imports HFile or Result files into MapR Database binary tables. For example:
    hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles
       -Dmapred.reduce.tasks=2 
       -inputDir < input directory, for example: /test/tabler.kv >
       -table < table name, for example: /table2 >
       [ -format < Result|HFile > ]
       [ -sample < true|false > ]
       [ -mapOnly < true|false > ]

Full Bulk Loads

Full bulkload operations offer the best performance advantage because it skips the write-ahead log (WAL) typical of MapR Database binary table operations. Full bulkload operations can only be performed on empty tables that have the bulkload attribute set to true. This value is set only when creating a table.

When you set the bulkload attribute, you cannot enable replication on the table. Since this effectively disables logging on the table, MapR Database also does not capture log data that Elasticsearch can use to index the table.

IMPORTANT Tables are unavailable for normal client operations, including put, get, and scan operations, while a full bulkload operation is in progress.

To create a MapR Database binary table for bulkloading, use one of the following:

  • maprcli table create command with tthe -bulkload parameter set to true.

  • Apache HBase shell create command with the BULKLOAD parameter set to true. For example:
    hbase> create '/a0','f1', BULKLOAD => 'true'
    If you want to pre-split a table, separate the BULKLOAD parameter from the SPLITS parameter. For example:
    hbase> create '/t1', 'f1', {SPLITS => ['10', '20', '30']}, {BULKLOAD => 'true'} 
  • Control System with Will table be bulkload? option set to Yes under table PROPERTIES.

NOTE Attempting a full bulkload on a table that does not have the bulkload attribute set to true results in an incremental bulkload being performed instead.
After you perform a full bulkload on a table, you cannot perform a full bulkload on it again. For example:
  • You cannot use the maprcli table edit command to set the bulkload parameter to TRUE again.
  • You cannot use the Apache HBase shell alter command to set the BULKLOAD parameter to TRUE again.
  • In the Control System, the Will table be bulkload? option cannot be modified after table creation.

Incremental Bulk Loads

Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.

NOTE Tables are available for client operations, such as put, get, and scan operations, during incremental bulk loads.

You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.

NOTE Whether you create a table with the maprcli table create command, with the hbase shell’s create command, or in the Control System, incremental loads are supported by default.