Change Data Capture
The Change Data Capture (CDC) system allows you to capture changes made to data records in MapR Database tables (JSON or binary) and propagate them to a MapR Event Store For Apache Kafka topic.
These data changes are the result of inserts, updates, and deletions and are called change data records. Once the change data records are propagated to a topic, a MapR Event Store For Apache Kafka/Kafka consumer application is used to read and process them.
NOTE The order of the records in the topic-partition is the same as the order of the
changes made to the table.
The order is retained because change data records for the same key are propagated
to the same topic-partition.
Why Use Change Data Capture?
CDC can be used in many
ways, including the following:
- To track changes occurring in a MapR Database table and perform real-time processing on the data.
- To keep caches for search indexes (such as Elastic Search, Solr), materialized views, synchronization between data warehouses or data marts with data stored in MapR Database in real time.
- To manage separate MapR Database instances for transactional and reporting purposes and to keep them in sync in real time for real time analytics.
- To provide arbitrary external systems the ability to globally consume MapR Database table changes.
How Do I Get Started?
The following topics provide information you need to understand the CDC feature, to setup and use CDC, the maprcli commands used to perform tasks, and to consume the data via your application.