Optimizing Queries with Indexes
MapR Database provides a highly scalable key-value database platform on which you can run SQL queries using Drill. As of the 6.0 release of the MapR Data Platform, MapR Database natively supports indexes on secondary fields in JSON tables.
NOTE MapR Database does not support indexes on binary tables.
An index is a special table
that stores a subset of document fields from a JSON table. The primary field in a JSON table
is the _id
field (unique key field). By default, MapR Database sorts the JSON table
by the _id
field. All other fields in the JSON table are secondary fields.
You can create indexes on the secondary fields in a JSON table to eliminate full tables scans
and significanlty improve query performance. See MapR Database as a Document
Database and Secondary Index
Concepts for more information. Benefits of Indexes
Well-designed indexes can optimize access to data stored in MapR Database JSON tables and improve performance for high read operations, fast integrated analytics, and complex operational analytics. See Secondary Indexes for more information about the benefits of indexes.
Types of Queries that Benefit from Indexes
Indexes primarily benefit queries with filters in the WHERE clause and queries with an
ORDER BY clause for sorting, as described in the following table:
Query Type | Description |
Equality | Equality queries contain equality conditions, such as a=1 and can also include IN. See Equality Queries. |
Range | Range queries contain range conditions, such as <=, >=, and the
LIKE pattern matching condition. See Range
Queries. NOTE The LIKE operator only works on fields that have varchar
data types. To use the LIKE operator in queries, use the CAST function to explicitly cast fields to varchar. To
use indexes for such queries, create indexes on the cast expressions, as
explained in Using Casts in Secondary Indexes. |
ORDER BY | ORDER BY queries specify a sort order. If the ordering and sorting of the index key list match the ordering specified in a query, the optimizer in Drill does not have to sort the data after the index scan. See ORDER BY Queries. |
Multi-index | Multi-index queries contain conditions on multiple fields. Drill can scan multiple indexes and use the intersection of the matching documents to optimize these queries. Multi-index queries are an alternative to using composite key indexes. See Multi-Index Queries. |
Drill can create index plans for queries with and without filters in the WHERE clause. For example, Drill can create an index plan for an ORDER BY query that does not have filters.
Drill 1.12 and later also supports the following types of queries without filters :
- GROUP BY
- JOIN
- DISTINCT
Types of Indexes Supported by MapR Database
MapR Database supports several types of indexes on JSON tables including simple, composite,
hashed, covering, and indexes with the CAST function.
NOTE MapR Database enforces certain restrictions on indexes, such as a limit of 32 KB on the collective size of all
indexed keys for each index. See Restrictions on
Secondary Indexes for a full list of restrictions and Data Types Supported for
Secondary Indexes.
The following table lists the supported index types with brief descriptions and links to
topics that provide more information:
Index Type | Description |
Simple | Simple indexes are indexes with a single indexed field (or key). See Simple Indexes. |
Composite | Composite indexes are indexes that have more than one indexed field (or key). See Composite Indexes. |
Hashed | Hashed indexes are indexes that distribute keys across logical partitions to avoid the creation of hot spots when MapR Database updates the index with new keys from the JSON table. See Hashed Indexes. |
Covering | A covering index is an index that allows MapR Database to process a query using only the secondary indexes. MapR Database does not have to read data in the JSON table. See Covering Indexes. |
Indexes with the CAST function | Indexes with the CAST function convert the indexed field to the data type specified by the CAST function and store the results. See Using Casts in Secondary Indexes. |
Steps Required to Use Indexes
To use the index functionality with Drill, complete the following steps:
- Install the latest version of the required MapR software on the cluster. See Preparing Clusters for Querying using Secondary Indexes on JSON Tables and Installing Drill.
- Evaluate your queries and design indexes that support the queries. See Understanding the Secondary Index Workflow and Designing Secondary Indexes.
- Create indexes on JSON tables in MapR Database. See Adding Secondary
Indexes on JSON Tables and Managing
Secondary Indexes. NOTE The user that creates indexes on a JSON table must have created the table or have the
indexperm
permission in addition toreadAce
on the volume andlookupdir
on directories in the table path. If you do not have these permissions, consult with your system administrator. - Issue queries.
- Verify that Drill uses the available indexes. See Determining Index Use and Troubleshooting Indexes.
Additional Information
- To see how Drill selects a query plan, see Selection and Execution of Secondary Indexes.
- To learn about the index planning and execution configuration options available in Drill, see Index Planning and Execution Configuration Options.
- For information about index architecture, see Implementation of Secondary Indexes.