Change Data Capture

The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR-DB tables (JSON or binary) and propagate them to a MapR-ES topic.

These data changes are the result of inserts, updates, and deletions and are called change data records. Once the change data records are propagated to a topic, a MapR-ES/Kafka consumer application is used to read and process them.

NOTE: The order of the records in the topic-partition is the same as the order of the changes made to the table. The order is retained because change data records for the same key are propagated to the same topic-partition.

Why Use Change Data Capture?

CDC can be used in many ways, including the following:
  • To track changes occuring in a MapR-DB table and perform real-time processing on the data.
  • To keep caches for search indexes (such as Elastic Search, Solr), materialized views, synchronization between data warehouses or data marts with data stored in MapR-DB in real time.
  • To manage separate MapR-DB instances for transactional and reporting purposes and to keep them in sync in real time for real time analytics.
  • To provide arbitrary external systems the ability to globally consume MapR-DB table changes.

How Do I Get Started?

The following topics provide information you need to understand the CDC feature, to setup and use CDC, the maprcli commands used to perform tasks, and to consume the data via your application.

Learning about CDCAdministering Change Data CaptureConsuming CDC changed data recordsUsing dbshell to perform CRUD operations on MapR-DB JSON tablesDeveloping client applications for MapR-DB JSON tables.Using hbshell to perform CRUD operations on MapR-DB binary tables.Developing client applications for MapR-DB binary tables.