concept: schema - davidkhala/data-warehouse GitHub Wiki
Database tables schema design choices for analytical use
Normalize and Denormalize
Denormalize
- 代价:数据冗余,预处理
- 收益:读取速度
Star Schema
denormalized dimension tables
- More storage space in use
- difficult to update and troubleshoot
Snowflake schema
More complexity in design by further normalizing dimension tables
- more JOIN between dimension tables
Fact table
Fact tables contain the numerical data that you want to analyze.
- typically have a large number of rows and are the primary source of data for analysis.
Dimension table
Dimension tables contain descriptive information about the data in the fact tables.
- descriptive backdrop: typically have a small number of rows and are used to provide context for the data in the fact tables.
- Think of a dimension table as the who, what, where, when, why of your data warehouse