𝐂𝐡𝐚𝐧𝐠𝐞 𝐃𝐚𝐭𝐚 𝐂𝐚𝐩𝐭𝐮𝐫𝐞 (𝐂𝐃𝐂) 𝐒𝐮𝐦𝐦𝐚𝐫𝐲 - rnakidi/dsa GitHub Wiki

𝐂𝐡𝐚𝐧𝐠𝐞 𝐃𝐚𝐭𝐚 𝐂𝐚𝐩𝐭𝐮𝐫𝐞 (𝐂𝐃𝐂) 𝐒𝐮𝐦𝐦𝐚𝐫𝐲

CDC is a technique used in databases to capture and replicate changes (like INSERT, UPDATE, and DELETE operations) in real-time or near real-time. Instead of querying entire tables for updates, CDC allows systems to automatically detect and process only the changed data, improving efficiency and performance.

📍 𝐊𝐞𝐲 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬:

  • Real-Time Analytics: Provides immediate insights by capturing live data changes.

  • Resource Efficiency: Reduces the load on the source database by only tracking changes .

  • Data Synchronization: Ensures all systems are up-to-date with the latest data.

  • System Recovery: Facilitates reconstructing system states using a sequence of changes.

📍 𝐓𝐲𝐩𝐞𝐬 𝐨𝐟 𝐂𝐃𝐂:

  1. Trigger-Based: Uses database triggers to capture changes.

  2. Log-Based: Reads changes directly from transaction logs.

  3. Timestamp-Based: Uses timestamp columns to identify modified records.

📍 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞𝐬:

  • Data Integrity: Ensuring all changes are accurately captured.

  • Scalability: Adapting to growing data volumes.

  • Latency: Minimizing delay in data propagation.

📍 𝐓𝐨𝐨𝐥𝐬:

  • Kafka: Ideal for managing the flow of change events.

  • Debezium: An open-source CDC tool that integrates with Kafka to stream changes from various databases.

CDC is increasingly vital for modern data strategies, ensuring real-time data, consistency, and aiding in recovery processes.

image

Source/Credit: https://www.linkedin.com/posts/ashish--joshi_%F0%9D%90%82%F0%9D%90%A1%F0%9D%90%9A%F0%9D%90%A7%F0%9D%90%A0%F0%9D%90%9E-%F0%9D%90%83%F0%9D%90%9A%F0%9D%90%AD%F0%9D%90%9A-%F0%9D%90%82%F0%9D%90%9A%F0%9D%90%A9%F0%9D%90%AD%F0%9D%90%AE%F0%9D%90%AB%F0%9D%90%9E-%F0%9D%90%82%F0%9D%90%83%F0%9D%90%82-activity-7278261041208193027-Eoj-?utm_source=share&utm_medium=member_desktop