Optimizing Queries with Indexes

HPE Ezmeral Data Fabric Database provides a highly scalable key-value database platform on which you can run SQL queries using Drill. As of the 6.0 release of the MapR Data Platform, HPE Ezmeral Data Fabric Database natively supports indexes on secondary fields in JSON tables.

NOTE HPE Ezmeral Data Fabric Database does not support indexes on binary tables.
An index is a special table that stores a subset of document fields from a JSON table. The primary field in a JSON table is the _id field (unique key field). By default, HPE Ezmeral Data Fabric Database sorts the JSON table by the _id field. All other fields in the JSON table are secondary fields. You can create indexes on the secondary fields in a JSON table to eliminate full tables scans and significanlty improve query performance. See HPE Ezmeral Data Fabric Database as a Document Database and Secondary Index Concepts for more information.

Benefits of Indexes

Well-designed indexes can optimize access to data stored in HPE Ezmeral Data Fabric Database JSON tables and improve performance for high read operations, fast integrated analytics, and complex operational analytics. See Secondary Indexes for more information about the benefits of indexes.

Types of Queries that Benefit from Indexes

Indexes primarily benefit queries with filters in the WHERE clause and queries with an ORDER BY clause for sorting, as described in the following table:
Query Type Description
Equality Equality queries contain equality conditions, such as a=1 and can also include IN. See Equality Queries.
Range Range queries contain range conditions, such as <=, >=, and the LIKE pattern matching condition. See Range Queries.
NOTE The LIKE operator only works on fields that have varchar data types. To use the LIKE operator in queries, use the CAST function to explicitly cast fields to varchar. To use indexes for such queries, create indexes on the cast expressions, as explained in Using Casts in Secondary Indexes.
ORDER BY ORDER BY queries specify a sort order. If the ordering and sorting of the index key list match the ordering specified in a query, the optimizer in Drill does not have to sort the data after the index scan. See ORDER BY Queries.
Multi-index Multi-index queries contain conditions on multiple fields. Drill can scan multiple indexes and use the intersection of the matching documents to optimize these queries. Multi-index queries are an alternative to using composite key indexes. See Multi-Index Queries.

Drill can create index plans for queries with and without filters in the WHERE clause. For example, Drill can create an index plan for an ORDER BY query that does not have filters.

Drill 1.12 and later also supports the following types of queries without filters :
  • GROUP BY
  • JOIN
  • DISTINCT
See Index Planning in Drill for more information.

Types of Indexes Supported by HPE Ezmeral Data Fabric Database

HPE Ezmeral Data Fabric Database supports several types of indexes on JSON tables including simple, composite, hashed, covering, and indexes with the CAST function.
NOTE HPE Ezmeral Data Fabric Database enforces certain restrictions on indexes, such as a limit of 32 KB on the collective size of all indexed keys for each index. See Restrictions on Secondary Indexes for a full list of restrictions and Data Types Supported for Secondary Indexes.
The following table lists the supported index types with brief descriptions and links to topics that provide more information:
Index Type Description
Simple Simple indexes are indexes with a single indexed field (or key). See Simple Indexes.
Composite Composite indexes are indexes that have more than one indexed field (or key). See Composite Indexes.
Hashed Hashed indexes are indexes that distribute keys across logical partitions to avoid the creation of hot spots when HPE Ezmeral Data Fabric Database updates the index with new keys from the JSON table. See Hashed Indexes.
Covering A covering index is an index that allows HPE Ezmeral Data Fabric Database to process a query using only the secondary indexes. HPE Ezmeral Data Fabric Database does not have to read data in the JSON table. See Covering Indexes.
Indexes with the CAST function Indexes with the CAST function convert the indexed field to the data type specified by the CAST function and store the results. See Using Casts in Secondary Indexes.

Steps Required to Use Indexes

To use the index functionality with Drill, complete the following steps:
  1. Install the latest version of the required data-fabric software on the cluster. See Preparing Clusters for Querying using Secondary Indexes on JSON Tables and Installing Drill.
  2. Evaluate your queries and design indexes that support the queries. See Understanding the Secondary Index Workflow and Designing Secondary Indexes.
  3. Create indexes on JSON tables in HPE Ezmeral Data Fabric Database. See Adding Secondary Indexes on JSON Tables and Managing Secondary Indexes.
    NOTE The user that creates indexes on a JSON table must have created the table or have the indexperm permission in addition to readAce on the volume and lookupdir on directories in the table path. If you do not have these permissions, consult with your system administrator.
  4. Issue queries.
  5. Verify that Drill uses the available indexes. See Determining Index Use and Troubleshooting Indexes.

Additional Information