3 types of Data Modelling

Relational Modelling - Transactional Data
Dimensional Modelling - Analytical Data, Data warehouse
Enterprise Modelling - Entire Organization Data

Normalization

Normalization is a way of organizing the data in the database. We use normalization in relational data modelling. Normalization reduces data redundancy and inconsistent data dependency.

Levels of Normalization

First Normal Form (1NF) - Single valued - Every attribute is single valued
Second Normal Form (2NF) - Complete - Must be in 1NF

In data modelling we should resolve many to many relationships by introducing bridge or associative entity

An entity must be complete

Third Normal Form (3NF) - Only - Must be in 2NF All the non primary attributes must be only dependent on the primary key attribute and not on any other attribute.

Star Schema

It is composed of a single fact table that references any number of dimension tables.

Dimensional Modelling

Dimensional modeling always uses the concept of facts (measures, the quantitative data for analysis), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions.

Data Mart

Data marts are a subset of data warehouses oriented to a specific line of business or functional area of an organization (e.g., marketing, finance, sales).

Granularity

Granularity represents the level of information stored in a table.

Slowly changing dimension

These are dimensions used to manage both historical data and current data in data warehousing
Reference: https://en.wikipedia.org/wiki/Slowly_changing_dimension
A slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule.
Some scenarios can cause referential integrity problems. Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6.

Type 0: retain original (contant, no change)

The Type 0 dimension attributes never change and are assigned to attributes that have durable values or are described as 'Original'. Examples: Date of Birth. Type 0 applies to most date dimension attributes.

Type 1: overwrite

This method overwrites old with new data, and therefore does not track historical data.

Type 2: add new row

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.

Supplier_Key	Supplier_Code	Supplier_Name	Supplier_State	Version
123	ABC	Acme Supply Co	CA	0
124	ABC	Acme Supply Co	IL	1

Another method is to add 'effective date' columns.

Supplier_Key	Supplier_Code	Supplier_Name	Supplier_State	Start_Date	End_Date
123	ABC	Acme Supply Co	CA	2000-01-01T00:00:00	2004-12-22T00:00:00
124	ABC	Acme Supply Co	IL	2004-12-22T00:00:00	NULL

The Start date/time of the second row is equal to the End date/time of the previous row. The null End_Date in row two indicates the current tuple version.

The Start date/time of the second row is equal to the End date/time of the previous row. The null End_Date in row two indicates the current tuple version. A standardized surrogate high date (e.g. 9999-12-31) may instead be used as an end date, so that the field can be included in an index, and so that null-value substitution is not required when querying.

And a third method uses an effective date and a current flag.

Supplier_Key	Supplier_Code	Supplier_Name	Supplier_State	Effective_Date	Current_Flag
123	ABC	Acme Supply Co	CA	2000-01-01T00:00:00	N
124	ABC	Acme Supply Co	IL	2004-12-22T00:00:00	Y

The Current_Flag value of 'Y' indicates the current tuple version.
Type 2 SCDs are not a good choice if the dimensional model is subject to frequent change

Type 3: add new attribute

This method tracks changes using separate columns and preserves limited history. The original table structure in Type 1 and Type 2 is the same but Type 3 adds additional columns.

Supplier_Key	Supplier_Code	Supplier_Name	Original_Supplier_State	Effective_Date	Current_Supplier_State
123	ABC	Acme Supply Co	CA	2004-12-22T00:00:00	IL

This record contains a column for the original state and current state—cannot track the changes if the supplier relocates a second time.
One variation of this is to create the field Previous_Supplier_State instead of Original_Supplier_State which would track only the most recent historical change.

Type 4: add history table

one table keeps the current data, and an additional table is used to keep a record of some or all changes.

Type 5

Type 6: combined approach The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6).

Home - satyamsingh1004/DataModelling GitHub Wiki

3 types of Data Modelling

Normalization

Levels of Normalization

Star Schema

Dimensional Modelling

Data Mart

Granularity

Slowly changing dimension

Type 0: retain original (contant, no change)

Type 1: overwrite

Type 2: add new row

Type 3: add new attribute

Type 4: add history table

Type 5

⚠️ GitHub.com Fallback ⚠️

Home - satyamsingh1004/DataModelling GitHub Wiki

3 types of Data Modelling

Normalization

Levels of Normalization

Star Schema

Dimensional Modelling

Data Mart

Granularity

Slowly changing dimension

Type 0: retain original (contant, no change)

Type 1: overwrite

Type 2: add new row

Type 3: add new attribute

Type 4: add history table

Type 5

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️