Optimizing HPE Ezmeral Data Fabric Database Tables Search by ID

Starting from the 1904 release (EEP 6.0.2, EEP 6.1.1, and EEP 6.2.0), search by ID is supported with Hive HPE Ezmeral Data Fabric Database JSON tables.

About this task

Property of Optimization

Prerequisites

The property name is hive.mapr.db.json.fetch.by.id.task.conversion and the value has a boolean type and by default is set to true, which means it is enabled.

Procedure

To disable optimization, set hive.mapr.db.json.fetch.by.id.task.conversion to false.

Conditions for Optimization

Procedure

This optimizer is designed for queries such as:
SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value>;
or:
SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value> AND (<condition_1>) AND (<condition_2>) ... AND (<condition_N>);
or:
SELECT *
FROM <mapr_db_json_table>
WHERE <Constant false operator>
where _id is a key column of HPE Ezmeral Data Fabric Database JSON table. It provides usage of the findById() method of the HPE Ezmeral Data Fabric Database JSON table. The following functionality is not supported:
  • joins
  • group by
  • distinct
  • lateral view
  • subquery
  • create table as select (CTAS) or insert
  • analyze
  • single source
The predicate is not actually a part of the filter, so it is ignored by push down:
SELECT * FROM t WHERE (CASE WHEN _id = 'value_a' THEN 2 ELSE 4 END) > 3;

Using Optimization

Procedure

  1. Consider the following HPE Ezmeral Data Fabric Database JSON table:
    CREATE TABLE t(doc_id string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/user/mapr/db_json_table","maprdb.column.id" = "doc_id");
  2. Run the EXPLAIN command:
    EXPLAIN SELECT col1 FROM t WHERE doc_id='id_004';
  3. The following output is produced:
    STAGE DEPENDENCIES:
     Stage-0 is a root stage
                            
    STAGE PLANS:
     Stage: Stage-0
      MapR DB JSON Fetch By Id Operator
      limit: -1
      Processor Tree:
      TableScan
      alias: t_small
      filterExpr: (doc_id = 'id_004') (type: boolean)
      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
      Filter Operator
       predicate: (doc_id = 'id_004') (type: boolean)
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
       Select Operator
       expressions: col1 (type: string)
       outputColumnNames: _col0
       Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
       ListSink
    An important part of a query plan is that it shows if optimization is available for the query:
    STAGE PLANS:
    Stage: Stage-0
    MapR DB JSON Fetch By Id Operator