Types of Secondary Indexes
MapR-DB JSON supports several index types, including simple indexes, composite indexes, hashed indexes, and indexes with casting. This section describes the properties of these indexes and the situations where each provides value.
The following diagram illustrates the different properties of indexes and index fields. Lines connecting properties represent properties that can be used in combination with one another. Click on the text in the diagram for a description of each property.
Indexed vs Included Fields
An index consists of indexed and included fields. Indexed fields are also referred to as index keys. The following lists describe the characteristics of each type of field:
- Indexed Fields
-
- Determine the sort order of the index and the order of the query result when used
- Allow filter conditions and ORDER BY conditions defined on these fields to be optimized
- Included Fields
-
- Do not affect the sort order of the index or the order of the query result
- Prevent reads of the JSON table
In general, you should define indexed fields on fields you filter and order on, and included fields on fields you reference but do not filter and order.
The following example illustrates when you would define an indexed vs an included field in your index. Assume you have a MapR-DB JSON table with the following sample data that contains customer information.
{ "_id": "10000", "FullName": { "LastName": "Smith", "FirstName": "John" }, "Address": { "Street": "123 SE 22nd St.", "City": "Oakland", "State": "CA", "Zipcode": "94601-1001" }, "Gender": "M", "AccountBalance": 999.99, "Email": "john.smith@company.com", "Phones": [ {"Type": "Home", "Number": "555-555-1234"}, {"Type": "Mobile", "Number": "555-555-5678"}, {"Type": "Work", "Number": "555-555-9012"} ], "Hobbies": ["Baseball", "Cooking", "Reading"], "DateOfBirth": "10/1/1985" }
Your query does the following:
- Filters on
Address.Zipcode
- Selects
FullName.FirstName
andFullName.LastName
Because your query filters on Address.Zipcode
, you should include that field as
an indexed field. However because you are only selecting on
FullName.FirstName
and FullName.LastName
, it suffices to
define the FullName
field as an included field.
maprcli table index add -path /customerInfo -index zipCodeIdx \
-indexedfields Address.Zipcode \
-includedfields FullName
There are additional differences in how indexed and included fields behave. The following table summarizes these differences:
Indexed Field | Included Field |
---|---|
There are some restrictions in the data types of indexed fields. See Data Types and Secondary Index Fields for the complete list of types. | Data types of included fields can be any type. There is no data type restriction. |
The collective size of all indexed fields is a maximum of 32KB. | Included fields do not affect the size limit of an index. |
Adding indexed fields increases the cost of key comparisons when scanning the index, due to the increase in the index key size. | Adding included fields does not impact the index scan cost. |
Included fields influence whether an index is a covering index for a query. See Covering Indexes for more information about this concept.
Indexed Field Sort Order
AccountBalance
in descending order,
create the following
index.maprcli table index add -path /customerInfo -index BalanceIdx \
-indexedfields AccountBalance:-1
Simple vs Composite Indexes
Simple indexes are indexes with a single indexed field (or key). Composite indexes have more than one key. In both cases, you can define zero or more included fields. See Simple Indexes and Composite Indexes for additional details.
Hashed vs Non-Hashed Indexes
By default, indexes are stored in sort order across the index key values. This can lead to hotspots if the sort order of the index keys match the order data is inserted into the JSON table. For example, if the indexed field has monotonically increasing timestamp values, such as the date a document is created, the tail end of the index becomes a hotspot. Hashed indexes avoid hotspotting by evenly distributing index writes across a number of logical partitions.
idx
on table,
tab
, with a single key,
idxKeyCol
.maprcli table index add -path /tab -index idx -indexedfields idxKeyCol \
-hashed true
See Hashed Indexes for further details.
Casting
age
field to an INT type and the height
field to
a FLOAT
type.maprcli table index add -path /castTable -index castIdx \
-indexedfields '$CAST(age@INT)','$CAST(height@FLOAT)'
See Using Casts in Secondary Indexes for further details.