Dimensional Model - Deepankar-Singh-Pawar/LearningNotes GitHub Wiki

Dimensional Modeling: Types of Dimensions and Facts

Types of Dimensions

1. Conformed Dimensions

  • Definition: Shared dimensions used across multiple fact tables or data marts, ensuring consistent analysis.
  • Example: A Customer dimension shared between Sales and Support fact tables, with attributes like Customer ID, Name, and Region.
  • Implementation: Maintain the same structure and keys in all associated data marts.

2. Role-Playing Dimensions

  • Definition: A single dimension table that plays multiple roles within a fact table.
  • Example: A Date dimension used as Order Date, Ship Date, and Delivery Date.
  • Implementation: Use aliases or multiple joins in SQL to reference the same table with different roles.

3. Slowly Changing Dimensions (SCD)

  • Definition: Handle changes to dimension attributes over time.
  • Types:
    • Type 1: Overwrite old data (e.g., correcting a typo in Customer Address).
    • Type 2: Create a new row for each change with effective dates (e.g., tracking Customer Address history).
    • Type 3: Add new columns to store both current and historical values (e.g., Current Region and Previous Region).
  • Implementation: Choose SCD type based on the business requirement for historical tracking.

4. Junk Dimensions

  • Definition: Combine unrelated low-cardinality attributes into a single dimension to simplify the model.
  • Example: A Flag Dimension for attributes like IsPromotional and IsReturned.
  • Implementation: Create a single table with all combinations of these attributes, reducing clutter in the fact table.

5. Degenerate Dimensions

  • Definition: A dimension stored as an attribute within the fact table, not as a separate dimension table.
  • Example: Order Number or Invoice ID in a Sales fact table.
  • Implementation: Store as a column in the fact table since it doesn’t have additional descriptive attributes.

6. Hierarchical Dimensions

  • Definition: Capture hierarchical relationships, either in a single table or as normalized tables.
  • Example: A Geography dimension with Country → State → City.
  • Implementation: Include hierarchical levels as separate columns or use snowflake schema for normalization.

7. Mini Dimensions

  • Definition: Handle rapidly changing attributes by splitting them into smaller dimensions.
  • Example: A Customer Demographics mini-dimension for attributes like Age Group and Income Level.
  • Implementation: Create a separate dimension table and use a surrogate key to link it to the fact table.

8. Shrunken Dimensions

  • Definition: A subset of attributes from a larger dimension, often used for aggregated facts.
  • Example: A Time dimension with only Month and Year levels for a monthly sales summary.
  • Implementation: Create a smaller version of the original dimension to improve performance.

9. Inferred Dimensions

  • Definition: Placeholder dimension rows created during ETL when complete data is unavailable.
  • Example: A Customer dimension row with only a Customer ID and default values for other attributes.
  • Implementation: Update the inferred row later when full data is available.

10. Role-Constrained Dimensions

  • Definition: A single dimension filtered for specific roles.
  • Example: A Product dimension filtered for Current Products and Discontinued Products.
  • Implementation: Apply filters or constraints in reporting tools or views.

Types of Facts

1. Additive Facts

  • Definition: Can be summed across all dimensions.
  • Example: Sales Revenue or Units Sold.
  • Implementation: Use for measures that aggregate easily across dimensions.

2. Semi-Additive Facts

  • Definition: Can be summed across some dimensions but not all.
  • Example: Account Balance (can be summed by accounts but not across time).
  • Implementation: Use business rules to define aggregation behavior.

3. Non-Additive Facts

  • Definition: Cannot be summed across any dimension.
  • Example: Ratios like Profit Margin or percentages.
  • Implementation: Calculate non-additive facts at query time using derived metrics.

4. Factless Fact Tables

  • Definition: Contain no measures, only foreign keys to dimensions to track events or relationships.
  • Example: A Student Attendance fact table to record which students attended which classes.
  • Implementation: Use for tracking many-to-many relationships or events.

5. Cumulative Snapshot Facts

  • Definition: Capture measures that accumulate over time.
  • Example: Total Revenue or Total Inventory.
  • Implementation: Update the snapshot at regular intervals.

6. Transactional Facts

  • Definition: Record detailed, granular metrics about individual transactions.
  • Example: Sales Transaction capturing Price, Quantity, and Discount.
  • Implementation: Use for fine-grained analysis and operational reporting.

7. Periodic Snapshot Facts

  • Definition: Record metrics at regular intervals to analyze trends.
  • Example: Daily Account Balances.
  • Implementation: ETL processes should create snapshots at consistent time intervals.

8. Derived Facts

  • Definition: Facts calculated from existing measures.
  • Example: Profit Margin calculated as (Revenue - Cost) / Revenue.
  • Implementation: Store derived facts in reporting tools or materialize them in views.

Summary Table

Dimensions

Dimension Type Example Use Case
Conformed Dimensions Customer, Product Shared across data marts
Role-Playing Dimensions Date (Order Date, Ship Date) Multiple roles in a fact table
Slowly Changing Dimensions Customer Address Track attribute changes over time
Junk Dimensions Flags (IsReturned, IsPromotional) Simplify low-cardinality attributes
Degenerate Dimensions Order Number Operational identifiers
Hierarchical Dimensions Geography (Country, State, City) Analyze hierarchical relationships
Mini Dimensions Customer Demographics Rapidly changing attributes
Shrunken Dimensions Time (Year, Month) Aggregated fact tables
Inferred Dimensions Placeholder Customers Missing data in ETL

Facts

Fact Type Example Use Case
Additive Facts Sales Revenue Summable across dimensions
Semi-Additive Facts Account Balance Summable across some dimensions
Non-Additive Facts Profit Margin Ratios and percentages
Factless Fact Tables Student Attendance Track events or relationships
Cumulative Snapshot Facts Total Revenue Track metrics over time
Transactional Facts Sales Transactions Granular transaction-level analysis
Periodic Snapshot Facts Daily Account Balances Time-series trend analysis
Derived Facts Profit Margin Calculated measures

Would you like further examples or diagrams to visualize these concepts?