Data Warehouse Terminologies - sachit914/datawarehouse GitHub Wiki

Data Warehouse Terminologies

ETL Vs ELT

ETL

Extract Transform Load (ETL) Used in On premise BI / data warehouse Solution

image

ELT

Extract Transform Load (ETL) Used in On premise BI / data warehouse Solution

  • We extract data and dump raw data into data Lake then using etl tool like glue and use spark for data transformation

image

Stagging Vs ODS

staging

  • A landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process.
  • A landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process.

ODS- Operational Data Source

  • An Operational Data Store (ODS) is a central database that collects and stores the most up-to-date data from multiple operational (transactional) systems. It is primarily used for operational reporting and allows businesses to generate quick reports and perform analysis on real-time data without affecting the core systems.
  • Real-Time Data: Unlike a data warehouse, which usually stores historical data, an ODS provides a snapshot of the most recent data. This means it is regularly updated to reflect the current state of operational systems, like customer transactions or orders.
  • Separate from Core Systems
  • Real-Time Data

image

OLTP VS OLAP

OLTP

  • OLTP: Online transaction processing (OLTP) captures, stores, and processes data from transactions in real time.

image

OLAP

  • OLAP: Online analytical processing (OLAP) uses complex queries to analyze aggregated historical data from OLTP systems.
  • In OLAP data will be in multi deminesional model

image

Fact Vs Dimension

Fact

  • Fact: Represents what you want to analyze.
  • Fact table Contains Measures along with Foreign Key reference of Dimension Tables.
  • Usually Transaction tables from OLTP System

Dimension Table

  • Dimension: Describes by which criteria you want to analyze the facts.
  • A dimension is a Business entity and have it's own attributes and this table is related to fact.
  • Usually Reference / Lookup tables in OLTP System

Star Schema Vs Snow Flake Schema

star schema

image

Snow Flake Schema

Types of Dimensions

  • Confirmed Dimension
  • Role-Playing Dimension
  • Junk Dimension
  • Degenerate Dimension
  • Slowly Changing Dimension

Surrogate Key

Granularity

Cardinality