𝐂𝐡𝐚𝐧𝐠𝐞 𝐃𝐚𝐭𝐚 𝐂𝐚𝐩𝐭𝐮𝐫𝐞 (𝐂𝐃𝐂) 𝐒𝐮𝐦𝐦𝐚𝐫𝐲 - rnakidi/dsa GitHub Wiki
𝐂𝐡𝐚𝐧𝐠𝐞 𝐃𝐚𝐭𝐚 𝐂𝐚𝐩𝐭𝐮𝐫𝐞 (𝐂𝐃𝐂) 𝐒𝐮𝐦𝐦𝐚𝐫𝐲
CDC is a technique used in databases to capture and replicate changes (like INSERT, UPDATE, and DELETE operations) in real-time or near real-time. Instead of querying entire tables for updates, CDC allows systems to automatically detect and process only the changed data, improving efficiency and performance.
📍 𝐊𝐞𝐲 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬:
-
Real-Time Analytics: Provides immediate insights by capturing live data changes.
-
Resource Efficiency: Reduces the load on the source database by only tracking changes .
-
Data Synchronization: Ensures all systems are up-to-date with the latest data.
-
System Recovery: Facilitates reconstructing system states using a sequence of changes.
📍 𝐓𝐲𝐩𝐞𝐬 𝐨𝐟 𝐂𝐃𝐂:
-
Trigger-Based: Uses database triggers to capture changes.
-
Log-Based: Reads changes directly from transaction logs.
-
Timestamp-Based: Uses timestamp columns to identify modified records.
📍 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞𝐬:
-
Data Integrity: Ensuring all changes are accurately captured.
-
Scalability: Adapting to growing data volumes.
-
Latency: Minimizing delay in data propagation.
📍 𝐓𝐨𝐨𝐥𝐬:
-
Kafka: Ideal for managing the flow of change events.
-
Debezium: An open-source CDC tool that integrates with Kafka to stream changes from various databases.
CDC is increasingly vital for modern data strategies, ensuring real-time data, consistency, and aiding in recovery processes.