Thursday, June 3, 2021

Change Data Capture

 Change data capture (CDC) is a software process or technology that identifies and tracks changes to data stored in a database, such as inserts, updates, and deletes. While a database is useful for storing the latest state of data, CDC preserves the various states of data over time by providing an audit trail, and it can provide incremental changes to other repositories or applications.

In a very basic example, CDC enables you in December of a given year to look up your home address as of January, even if you had moved in between, and your address in the database reflects the current value.


How Does Change Data Capture Work?


CDC delivers data on records that changed for database functions such as inserts, updates, and deletes, and makes a record of that change available either within a database itself or to other applications that rely on the data. CDC tools typically rely on the database’s transaction log, which keeps track internally of record changes for the purposes of system recovery. CDC tools leverage that information to deliver database changes to an external system.


What Are Common Methods of Change Data Capture?

There are different approaches that a system can use to capture changes in data. The use of timestamps is one of the most popular methods of CDC, as most systems track when a row was created and most recently modified.

Database transaction logs are also a resource for CDC. Log scanners can identify any changes in these transaction logs. As long as the log scanner can interpret the log, this can be an ideal solution for CDC because it has little impact on the underlying database, delivers changes with low latency, and ensures transaction integrity because every change is tracked in order.

As event streaming has gained popularity, so has the use of the publish/subscribe model of CDC, where a database triggers log or publish change events to a table and shares those changes with the CDC system. The series of updates that CDC delivers looks like a stream of data, making stream processing engines (like Hazelcast Jet) a suitable technology for consuming CDC data.







No comments:

Followers

Link