Steps Involved in Designing DW - sachit914/datawarehouse GitHub Wiki

Steps Involved in Designing DW

  • Define Business Use Cases
  • Identify the Grain
  • Design Dimensions
  • Design Fact Tables
  • Develop ETL Processes
  • Implement Slowly Changing Dimensions (SCD)
  • Testing and Validation
  • Documentation and Training

1. Define Business Use Cases

  • Gather Requirements: Work with stakeholders to gather detailed requirements for each business process.

2. Identify the Grain

  • grain refers to the level of detail at which data is stored in the fact table. It answers the question: "What does one row of data in this table represent?"

Fine Grain (Very Detailed):

Transaction ID: 123
Product Sold: Laptop
Store: New York Store
Date: October 18, 2024
Time: 3:15 PM
Quantity Sold: 1
Total Price: $1000
  • Imagine you store every single transaction. Each row in your table shows one product sold in one transaction.
  • This level of detail is very useful for answering questions like:
    • "What products are selling the most at 3 PM?"
    • "How many laptops were sold in a specific transaction?"

4. Design Fact Tables

  • a fact table is where you store the numeric data (like sales, profits, quantities) that you want to analyze. These numbers are called facts.
  • The fact table collects measurements (like total sales, total cost, etc.) Identify Facts
    • Facts are the numbers or measurements you want to analyze. These could be things like:
      • Sales Amount (How much was sold?)
      • Quantity Sold (How many items were sold?)
      • Profit (How much profit was made?)
      • Discount Given (What was the discount?)
  • Foreign Keys: These are columns that link the fact table to the dimension tables.
  • Degenerate Dimensions: Sometimes, you have attributes (like transaction ID or invoice number) that don’t fit neatly into any dimension table. These are called degenerate dimensions because they come directly from the fact itself. They don’t have their own table.
  • Example: Transaction ID might be stored in the fact table, as it uniquely identifies each transaction.
Date Key	Product Key	Store Key	Customer Key	Sales Amount	Quantity Sold	Transaction ID
20241018	1234	5678	9876	$1000	2	87654
20241019	2345	5678	5432	$500	1	98765

4. Design Dimensions

  • A dimension table is a table in a data warehouse that contains descriptive attributes (often textual) that give context to the facts in a fact table. Dimensions answer questions like who, what, when, where, and how about the data and help in organizing and categorizing the information
  • For example, if you're analyzing sales, dimensions might include products, time (dates), store locations, or customers. These dimensions help describe who bought a product, what product was bought, when it was bought, and where it was bought.

5. Develop ETL Processes

  • Extract, Transform, Load (ETL): Design and implement ETL processes to populate the dimension and fact tables. This involves extracting data from source systems, transforming it to fit the data warehouse schema, and loading it into the data warehouse.
  • Data Quality and Cleansing: As part of the ETL process, implement steps to ensure data quality and perform data cleansing to correct any issues in the source data.

6. Implement Slowly Changing Dimensions (SCD)

  • Manage Dimension Changes: Dimensions can change over time (e.g., a product's price changes, or a customer relocates). Plan for how to handle these changes in your dimension tables.

7. Testing and Validation

  • Test for Accuracy: Before going live, thoroughly test the data warehouse to ensure that it accurately reflects the source data and meets the business requirements.
  • Test for Accuracy: Before going live, thoroughly test the data warehouse to ensure that it accurately reflects the source data and meets the business requirements.