HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
- MapR XD Distributed File and Object Store
  MapR XD Distributed File and Object Store is a distributed file system for data storage, data management, and data protection. MapR XD supports mounting and cluster access via NFS and FUSE-based POSIX clients (basic, platinum, or PACC) and also supports access and management via HDFS APIs.
- MapR Database
  MapR Database is an enterprise-grade, high-performance, NoSQL database management system that you can use for real-time, operational analytics.
  - Architecture
    MapR Database is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and key-value data models.
  - Data Models
    MapR Database can be used as both a document database and a column-oriented database. As a document database, JSON documents are stored in HPE Ezmeral Data Fabric Database JSON table. As a column-oriented database, binary files are in stored HPE Ezmeral Data Fabric Database binary tables.
  - Secondary Indexes
    Beginning with MapR 6.0, MapR Database JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR Database.
    - Secondary Index Concepts
      Describes secondary index concepts, including use cases, types of indexes, types of queries that benefit from indexes, and how indexes are implemented.
      - Uses for Secondary Indexes
        Describes typical use cases that can benefit from secondary indexes.
      - Types of Secondary Indexes
        MapR Database JSON supports several index types, including simple indexes, composite indexes, hashed indexes, and indexes with casting. This section describes the properties of these indexes and the situations where each provides value.
      - Data Types and Secondary Index Fields
        Secondary indexes support a specific set of data types. This section describes how indexed and included fields in secondary indexes behave for various categories of data types.
      - Restrictions on Secondary Indexes
        This topic lists and describes the restrictions on secondary indexes. It is important for you to understand the type, size, field definition, option, and index use restrictions when defining and using secondary indexes.
      - Queries that Benefit from Secondary Indexes
        Secondary indexes benefit queries with filter conditions, ORDER BY clause, and projections.
      - Selection and Execution of Secondary Indexes
        This section provides an overview of secondary index selection and execution in MapR Database JSON. It describes the variations in functionality, depending on the components you are using.
      - Implementation of Secondary Indexes
        This topic describes how MapR Database implements secondary indexes. It provides an overview of basic architectural concepts and the rationale behind design choices.
    - Understanding the Secondary Index Workflow
      Describes the overall workflow for using secondary indexes. This includes the roles of different users and the workflow steps involved.
    - Designing Secondary Indexes
      It is important that you create secondary indexes that provide the most benefit to your MapR Database JSON queries. This topic describes a general design approach that includes identifying query patterns, using common query patterns involving filters and ordering to determine which indexes to create, weighing the benefits of indexes against their update and storage costs, and taking into consideration index limitations.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR Database tables (JSON or binary) and propagate them to a MapR Event Store For Apache Kafka topic.
  - Table Replication
    You can replicate data in one table to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing MapR Database Data in Elasticsearch
    As of MapR 6.0, MapR Database Elastic Search integration capability is deprecated and no longer available in the MapR Database product.
- MapR Event Store For Apache Kafka
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- MapR Data Fabric for Kubernetes
  This section describes the MapR Data Fabric for Kubernetes, which include the Container Storage Interface (CSI) driver for multiple container-orchestration systems, and the FlexVolume driver for Kubernetes.
- Cluster Management
  Provides a synopsis of the various cluster components and their management.
- Security
  Provides an overview of the MapR security features.
- YARN
- Client Connections
  The following sections describe how a client connects to local and remote MapR clusters.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Secondary Index Concepts

Describes secondary index concepts, including use cases, types of indexes, types of queries that benefit from indexes, and how indexes are implemented.

Indexes created on regularly queried JSON table fields provide MapR Database quick access to data. Indexes primarily benefit queries with filters in the WHERE clause, queries with an ORDER BY clause for sorting, and queries where all fields projected in the query are included in the index. They provide the most benefit when an index contains all fields referenced in a query. For filters, indexes reduce the amount of data read. MapR Database implements indexes using JSON tables. Like JSON tables, an index stores data in sort order. Reading data through the index eliminates the need to sort the data if the index and query sort orders match.

Each JSON table in MapR Database has a unique field that serves as the rowkey. A secondary index contains indexed and included fields. The indexed fields, also referred to as index keys, define the sort order of the index. The index stores the values of the index keys along with the rowkey corresponding to each key value. The rowkey links the index to the JSON table. MapR Database can perform a range scan on the index and then use the corresponding rowkeys to quickly locate data in the JSON table. Additional fields can be included in the index so that queries that only need these included (or covered) fields can get all the data they need from the index and therefore will not require access to the base table.

The following diagram illustrates the mapping. Each index entry consists of the index key value followed by the rowkey of the corresponding JSON document. The color coding highlights the matching index and JSON table entries.

IMPORTANT Secondary indexes can only be created on MapR Database JSON tables.