20220516 Udemy Data warehouse Fundamentals for beginners - 1001fufu/Learning-Log GitHub Wiki

Study Link: https://www.udemy.com/course/data-warehouse-fundamentals-for-beginners/learn/lecture/17728292#overview 2 important aspects: data warehouse architecture + design engineering.


Different data warehouse design


Data warehouse: centralized, component-based centralized: EDW (relations & non relational), Data Lake (non relational, big data) component-based: architect (DW+DM, DM-> DM Bus), non architect (federated DMs) image Operational Data Store (ODS): focus on current data, often real-time data source.


How data bring value to organization


BI + Data Warehouse/Data Lake/ODS


Different situations for different design at staging layer


Staging layer could be persistent or non-persistent.

Staging layer should focus on ** E(Extraction)**, and not T(Transformation). image

Staging layer could be persistent or non-persistent.

image image


Design Engineer



Star Schema vs Snowflake Schema


image

image

Natural key travels from source system to data warehouse. Surrogate key: generate by database itself or key management system, no business meaning. image Best practice: use Surrogate key.

DIMENSION TABLE

SNOW FLAKE VS STAR Schema SNOW flake has more tables, primary keys/foreign keys

FACT TABLE

  1. Transaction Fact tables: facts/measurements from transaction
  2. Periodic Snapshot Fact tables: track a given measurement at regular intervals
  3. Accumulating Snapshot Fact tables: track the progress of business process through formally defined stages.
  4. Factless Fact tables: -> record occurrence of a transaction that has no measurements -> record coverage or eligibility relationships

Rules governing fact tables

  1. facts are available same grain (level of details)
  2. facts occur simultaneously

Primary key

Dimension table: primary key -> surrogate key, foreign key is other dim table's primary key Fact table: primary key -> combination of all foreign keys relating back to dim tables, even if fact table has natural key.

Accumulating Snapshot Fact tables

image

Factless Fact Table

  1. first type: PK/FK only (sounds like association table) image

Compare fact tables in star vs snowflake schema

Do not do centipede fat tables. image

Do this image

SCD - Slowly changing dimension

image image

  • Type 1: new table
  • Type 2: new row (surrogate key, 1: current_flag, 2: effective & expiry date)
  • Type 3: new column (current division, previous division, every row change at the same time, like reorganization)

image