to do - sachit914/datawarehouse GitHub Wiki
incremental load
- since we have large data size we cant always import table from source to stagging area , for this we use incremnetal load
- Incremental load is a data processing technique used to efficiently transfer only the new or changed data from a source system to a target system, rather than transferring the entire dataset. This is especially useful when dealing with large volumes of data, as it minimizes the load on network resources and reduces processing time.
Key Concepts of Incremental Load:
-
Change Data Capture (CDC): Identify changes in the source data since the last load. This can involve techniques such as database triggers, timestamps, or log-based methods.
-
Staging Area: A temporary storage area where data is processed before being moved to its final destination. This is where the incremental changes are gathered.
-
Batch Processing: Incremental loads are often scheduled at regular intervals (e.g., hourly, daily) to ensure the target system is updated frequently without overwhelming it.
-
Error Handling: It’s important to implement robust error handling to manage any issues during the incremental load process, ensuring data integrity.
-
Performance Optimization: Indexes and partitioning can be used to enhance the performance of the incremental load process.
https://www.youtube.com/watch?v=RGSKeK9xow0&ab_channel=CloudQuickLabs
https://www.youtube.com/watch?v=R-1go56ip5g&ab_channel=CloudQuickLabs