Change Data Capture

The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR Database tables (JSON or binary) and propagate them to a MapR Event Store For Apache Kafka topic.

These data changes are the result of inserts, updates, and deletions and are called change data records. Once the change data records are propagated to a topic, a MapR Event Store For Apache Kafka/Kafka consumer application is used to read and process them.

NOTE The order of the records in the topic-partition is the same as the order of the changes made to the table. The order is retained because change data records for the same key are propagated to the same topic-partition.

Why Use Change Data Capture?

CDC can be used in many ways, including the following:
  • To track changes occurring in a MapR Database table and perform real-time processing on the data.
  • To keep caches for search indexes (such as Elastic Search, Solr), materialized views, synchronization between data warehouses or data marts with data stored in MapR Database in real time.
  • To manage separate MapR Database instances for transactional and reporting purposes and to keep them in sync in real time for real time analytics.
  • To provide arbitrary external systems the ability to globally consume MapR Database table changes.

How Do I Get Started?

The following topics provide information you need to understand the CDC feature, to setup and use CDC, the maprcli commands used to perform tasks, and to consume the data via your application.

Learning about CDCAdministering Change Data CaptureConsuming CDC changed data recordsUsing dbshell to perform CRUD operations on MapR Database JSON tablesDeveloping client applications for MapR Database JSON tables.Using hbshell to perform CRUD operations on MapR Database binary tables.Developing client applications for MapR Database binary tables.