Home - satyamsingh1004/DataModelling GitHub Wiki
- Relational Modelling - Transactional Data
- Dimensional Modelling - Analytical Data, Data warehouse
- Enterprise Modelling - Entire Organization Data
Normalization is a way of organizing the data in the database. We use normalization in relational data modelling. Normalization reduces data redundancy and inconsistent data dependency.
- First Normal Form (1NF) - Single valued - Every attribute is single valued
- Second Normal Form (2NF) - Complete - Must be in 1NF
In data modelling we should resolve many to many relationships by introducing bridge or associative entity
An entity must be complete
- Third Normal Form (3NF) - Only - Must be in 2NF All the non primary attributes must be only dependent on the primary key attribute and not on any other attribute.
- It is composed of a single fact table that references any number of dimension tables.
- Dimensional modeling always uses the concept of facts (measures, the quantitative data for analysis), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions.
- Data marts are a subset of data warehouses oriented to a specific line of business or functional area of an organization (e.g., marketing, finance, sales).
- Granularity represents the level of information stored in a table.
- These are dimensions used to manage both historical data and current data in data warehousing
- Reference: https://en.wikipedia.org/wiki/Slowly_changing_dimension
- A slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule.
- Some scenarios can cause referential integrity problems. Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6.
- The Type 0 dimension attributes never change and are assigned to attributes that have durable values or are described as 'Original'. Examples: Date of Birth. Type 0 applies to most date dimension attributes.
- This method overwrites old with new data, and therefore does not track historical data.
This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Version |
---|---|---|---|---|
123 | ABC | Acme Supply Co | CA | 0 |
124 | ABC | Acme Supply Co | IL | 1 |
Another method is to add 'effective date' columns.
Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Start_Date | End_Date |
---|---|---|---|---|---|
123 | ABC | Acme Supply Co | CA | 2000-01-01T00:00:00 | 2004-12-22T00:00:00 |
124 | ABC | Acme Supply Co | IL | 2004-12-22T00:00:00 | NULL |
The Start date/time of the second row is equal to the End date/time of the previous row. The null End_Date in row two indicates the current tuple version.
-
The Start date/time of the second row is equal to the End date/time of the previous row. The null End_Date in row two indicates the current tuple version. A standardized surrogate high date (e.g. 9999-12-31) may instead be used as an end date, so that the field can be included in an index, and so that null-value substitution is not required when querying.
And a third method uses an effective date and a current flag.
Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Effective_Date | Current_Flag |
---|---|---|---|---|---|
123 | ABC | Acme Supply Co | CA | 2000-01-01T00:00:00 | N |
124 | ABC | Acme Supply Co | IL | 2004-12-22T00:00:00 | Y |
- The Current_Flag value of 'Y' indicates the current tuple version.
- Type 2 SCDs are not a good choice if the dimensional model is subject to frequent change
- This method tracks changes using separate columns and preserves limited history. The original table structure in Type 1 and Type 2 is the same but Type 3 adds additional columns.
Supplier_Key | Supplier_Code | Supplier_Name | Original_Supplier_State | Effective_Date | Current_Supplier_State |
---|---|---|---|---|---|
123 | ABC | Acme Supply Co | CA | 2004-12-22T00:00:00 | IL |
- This record contains a column for the original state and current state—cannot track the changes if the supplier relocates a second time.
- One variation of this is to create the field Previous_Supplier_State instead of Original_Supplier_State which would track only the most recent historical change.
- one table keeps the current data, and an additional table is used to keep a record of some or all changes.
Type 6: combined approach The Type 6 method combines the approaches of types 1, 2 and 3 (1 + 2 + 3 = 6).