Home
6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
MapR-DB
MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
Change Data Capture
The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.

MapR 6.0 Documentation

6.0 Platform
Provides a brief overview of what the MapR data platform is and how you can use it to solve enterprise business needs.
- MapR-XD
- MapR-DB
  MapR-DB is an enterprise-grade, high-performance, NoSQL database management system. You can use it for real-time, operational analytics capabilities.
  - Architecture
    MapR-DB is an enterprise-grade, high performance, NoSQL (“Not Only SQL”) database management system. You can use it to add realtime, operational analytics capabilities to big data applications. As a multi-model NoSQL database, it supports both JSON document models and wide column data models.
  - Data Models
    MapR-DB can be used as both a document database and a wide-column database. As a document database, JSON documents are stored in MapR-DB JSON table. As a wide-column database, binary files are in stored MapR-DB binary tables.
  - Secondary Indexes
    Beginning with MapR 6.0, MapR-DB JSON natively supports secondary indexes on fields in JSON tables. Indexes provide you with flexible, high performance access to data stored in MapR-DB.
  - Change Data Capture
    The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.
    - Architecture and CDC
      This section provides an overview of how CDC works.
    - Getting Started with CDC
      This topic describes an end-to-end flow of how to establish and use Change Data Capture (CDC). It assumes that a new table and dataset will be created, although an existing table with data can also be used.
    - Data Modeling and CDC
      Change Data Capture (CDC) changed data records propagate in one direction; from a source table to a topic in a changelog stream. One stream with one topic can be created for the changed data records or multiple streams with multiple topics can be created.
    - Security and CDC
      Security for CDC is applied through Access Control Expressions (ACEs). In addition, if a secure cluster configuration is implemented, then additional setup may be needed depending on the configuration.
    - Restrictions for CDC
  - Table Replication
    Data in one table can be replicated to another table that is in the same cluster or in a separate cluster. This type of replication is in addition to the automatic replication that occurs with table regions within a volume.
  - Gateways for Indexing MapR-DB Data in Elasticsearch
    As of with MapR 6.0, MapR-DB Elastic Search integration capability is deprecated and no longer available in the MapR-DB product.
- MapR-ES
  MapR-ES brings integrated publish and subscribe messaging to the MapR Converged Data Platform.
- Cluster Management
- Security
- YARN
- Client Connections

Change Data Capture

The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.

These data changes are the result of inserts, updates, and deletions and are called change data records. Once the change data records are propagated to a topic, a MapR-ES/Kafka consumer application is used to read and process them.

NOTE: The order of the records in the topic-partition is the same as the order of the changes made to the table. The order is retained because change data records for the same key are propagated to the same topic-partition.

Why Use Change Data Capture?

CDC can be used in many ways, including the following:

To track changes occuring in a MapR-DB table and perform real-time processing on the data.
To keep caches for search indexes (such as Elastic Search, Solr), materialized views, synchronization between data warehouses or data marts with data stored in MapR-DB in real time.
To manage separate MapR-DB instances for transactional and reporting purposes and to keep them in sync in real time for real time analytics.
To provide arbitrary external systems the ability to globally consume MapR-DB table changes.

How Do I Get Started?

The following topics provide information you need to understand the CDC feature, to setup and use CDC, the maprcli commands used to perform tasks, and to consume the data via your application.

(Topic last modified: 2018-09-18)