HPE Ezmeral Data Fabric 6.1.x is In Maintenance and transitions to "End of Maintenance" in June 2024. Please see the latest documentation.

About MapR 6.1
This site contains the main documentation for Version 6.1 of the MapR Converged Data Platform, including installation, configuration, administration, and reference information.
6.1 Installation
This section contains information about installing and upgrading MapR software. It also contains information about how to migrate data and applications from an Apache Hadoop cluster to a MapR cluster.
6.1 MapR Data Platform
MapR Data Platform is the industry-leading data platform for AI and analytics that solves enterprise business needs.
- MapR XD Distributed File and Object Store
  MapR XD Distributed File and Object Store is a distributed file system for data storage, data management, and data protection. MapR XD supports mounting and cluster access via NFS and FUSE-based POSIX clients (basic, platinum, or PACC) and also supports access and management via HDFS APIs.
- MapR Database
  MapR Database is an enterprise-grade, high-performance, NoSQL database management system that you can use for real-time, operational analytics.
  - Architecture
    MapR Database is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and key-value data models.
  - Data Models
    MapR Database can be used as both a document database and a column-oriented database. As a document database, JSON documents are stored in HPE Ezmeral Data Fabric Database JSON table. As a column-oriented database, binary files are in stored HPE Ezmeral Data Fabric Database binary tables.
  - Secondary Indexes
    Beginning with MapR 6.0, MapR Database JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR Database.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR Database tables (JSON or binary) and propagate them to a MapR Event Store For Apache Kafka topic.
    - Architecture and CDC
      This section provides an overview of how CDC works.
    - Getting Started with CDC
      Describes an end-to-end flow of how to establish and use Change Data Capture (CDC). It assumes that a new table and dataset will be created, although an existing table with data can also be used.
    - Data Modeling and CDC
      Change Data Capture (CDC) changed data records propagate in one direction - from a source table to a topic in a changelog stream. One stream with one topic can be created for the changed data records or multiple streams with multiple topics can be created.
    - Security and CDC
      Security for CDC is applied through Access Control Expressions (ACEs). In addition, if a secure cluster configuration is implemented, then additional setup may be needed depending on the configuration.
    - Restrictions for CDC
      Lists the limitations for Change Data Capture.
  - Table Replication
    You can replicate data in one table to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing MapR Database Data in Elasticsearch
    As of MapR 6.0, MapR Database Elastic Search integration capability is deprecated and no longer available in the MapR Database product.
- MapR Event Store For Apache Kafka
  MapR Event Store For Apache Kafka brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- MapR Data Fabric for Kubernetes
  This section describes the MapR Data Fabric for Kubernetes, which include the Container Storage Interface (CSI) driver for multiple container-orchestration systems, and the FlexVolume driver for Kubernetes.
- Cluster Management
  Provides a synopsis of the various cluster components and their management.
- Security
  Provides an overview of the MapR security features.
- YARN
- Client Connections
  The following sections describe how a client connects to local and remote MapR clusters.
6.1 Administration
This section describes how to manage the nodes and services that make up a cluster.
6.1 Development
This section contains information related to application development for Ezmeral ecosystem components and MapR Data Platform products, including the file system, Database (Key-Value and JSON), and Event Streams.
Other Docs
This section contains release-independent information, including: MapR Installer documentation, Ecosystem release notes, interoperability matrices, security vulnerabilities, and links to other MapR version documentation.
Glossary
Definitions for commonly used terms in MapR Converged Data Platform environments.

Change Data Capture

The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR Database tables (JSON or binary) and propagate them to a MapR Event Store For Apache Kafka topic.

These data changes are the result of inserts, updates, and deletions and are called change data records. Once the change data records are propagated to a topic, a MapR Event Store For Apache Kafka/Kafka consumer application is used to read and process them.

NOTE The order of the records in the topic-partition is the same as the order of the changes made to the table. The order is retained because change data records for the same key are propagated to the same topic-partition.

Why Use Change Data Capture?

CDC can be used in many ways, including the following:

To track changes occurring in a MapR Database table and perform real-time processing on the data.
To keep caches for search indexes (such as Elastic Search, Solr), materialized views, synchronization between data warehouses or data marts with data stored in MapR Database in real time.
To manage separate MapR Database instances for transactional and reporting purposes and to keep them in sync in real time for real time analytics.
To provide arbitrary external systems the ability to globally consume MapR Database table changes.

How Do I Get Started?

The following topics provide information you need to understand the CDC feature, to setup and use CDC, the maprcli commands used to perform tasks, and to consume the data via your application.