Build - davidkhala/data-warehouse GitHub Wiki

General data warehouse load task

  1. Ingest the new data to be loaded into a data lake, applying pre-load cleansing or transformations as required.
  2. Load the data from files into staging tables in the relational data warehouse.
  3. Load the dimension tables from the dimension data in the staging tables, updating existing rows or inserting new rows and generating surrogate key values as necessary.
  4. Load the fact tables from the fact data in the staging tables, looking up the appropriate surrogate keys for related dimensions.
  5. Perform post-load optimization by updating indexes and table distribution statistics.