20220516 Udemy Data warehouse Fundamentals for beginners - 1001fufu/Learning-Log GitHub Wiki
Study Link: https://www.udemy.com/course/data-warehouse-fundamentals-for-beginners/learn/lecture/17728292#overview 2 important aspects: data warehouse architecture + design engineering.
Different data warehouse design
Data warehouse: centralized, component-based
centralized: EDW (relations & non relational), Data Lake (non relational, big data)
component-based: architect (DW+DM, DM-> DM Bus), non architect (federated DMs)
Operational Data Store (ODS): focus on current data, often real-time data source.
How data bring value to organization
BI + Data Warehouse/Data Lake/ODS
Different situations for different design at staging layer
Staging layer could be persistent or non-persistent.
Staging layer should focus on ** E(Extraction)**, and not T(Transformation).
Staging layer could be persistent or non-persistent.
Design Engineer
Star Schema vs Snowflake Schema
Natural key travels from source system to data warehouse.
Surrogate key: generate by database itself or key management system, no business meaning.
Best practice: use Surrogate key.
DIMENSION TABLE
SNOW FLAKE VS STAR Schema SNOW flake has more tables, primary keys/foreign keys
FACT TABLE
- Transaction Fact tables: facts/measurements from transaction
- Periodic Snapshot Fact tables: track a given measurement at regular intervals
- Accumulating Snapshot Fact tables: track the progress of business process through formally defined stages.
- Factless Fact tables: -> record occurrence of a transaction that has no measurements -> record coverage or eligibility relationships
Rules governing fact tables
- facts are available same grain (level of details)
- facts occur simultaneously
Primary key
Dimension table: primary key -> surrogate key, foreign key is other dim table's primary key Fact table: primary key -> combination of all foreign keys relating back to dim tables, even if fact table has natural key.
Accumulating Snapshot Fact tables
Factless Fact Table
- first type: PK/FK only (sounds like association table)
Compare fact tables in star vs snowflake schema
Do not do centipede fat tables.
Do this
SCD - Slowly changing dimension
- Type 1: new table
- Type 2: new row (surrogate key, 1: current_flag, 2: effective & expiry date)
- Type 3: new column (current division, previous division, every row change at the same time, like reorganization)