concept: schema - davidkhala/data-warehouse GitHub Wiki

Database tables schema design choices for analytical use

Normalize and Denormalize

Denormalize

  • 代价:数据冗余,预处理
  • 收益:读取速度

Star Schema

denormalized dimension tables

  • More storage space in use
  • difficult to update and troubleshoot

Snowflake schema

More complexity in design by further normalizing dimension tables

  • more JOIN between dimension tables

Fact table

Fact tables contain the numerical data that you want to analyze.

  • typically have a large number of rows and are the primary source of data for analysis.

Dimension table

Dimension tables contain descriptive information about the data in the fact tables.

  • descriptive backdrop: typically have a small number of rows and are used to provide context for the data in the fact tables.
  • Think of a dimension table as the who, what, where, when, why of your data warehouse

SCD: versioning of Dimension table