etl (transform,load) - sachit914/datawarehouse GitHub Wiki

data unification
in different zone we have different tables with same table structure
so in data warehouse we can put data from both zone in same structure
dropping duplication
data type convertion
dropping empty rows

Load in ETL

Initial Load:

This is the first load of data from source systems into the data warehouse.
It typically involves transferring all historical data from the source to the data warehouse.
Initial loads are usually more resource-intensive and time-consuming compared to incremental loads because they involve large volumes of data.

Incremental Load:

After the initial load, subsequent loads are typically incremental.
Incremental loads involve transferring only the changed or newly added data since the last load.
This approach is more efficient in terms of time and resources, as it reduces the amount of data transferred and processed compared to the initial load.

types of incremental update

Append (Insert-Only):

Adds new data from the source to the existing data in the data warehouse.
Does not update or delete existing records; only inserts new records.
Simple and maintains historical data but can lead to data duplication.

In-Place Update:

Updates existing records in the data warehouse based on changes in the source.
Efficient for handling updates but does not retain historical versions of data.

Complete Replacement (Overwrite):

Replaces the entire dataset in the data warehouse with a new version from the source.
Ensures the data warehouse reflects the exact state of the source system but may lead to data loss if not managed carefully.

Rolling Update (Merge or Upsert):

Combines insert and update operations.
Inserts new records and updates existing records that have changed in the source.
Balances performance and data currency, ensuring the data warehouse reflects the latest data effectively.