database normalization - ghdrako/doc_snipets GitHub Wiki

https://en.wikipedia.org/wiki/Database_normalization#Example_of_a_step-by-step_normalization

obraz

Steps to normalization

Normalization operates on specific assumptions. Before initiating the normalization process, it is crucial to validate the accuracy of these assumptions.

ensure that there is only one fact per column. For instance, a full_name column where users can input both first and last names contains multiple strings that could constitute separate facts. As this information could be used separately in different parts of the business process, it is preferable to split it into separate columns
check that no data were lost during the transition to tabular form, especially in the case of unstructured data, such as text.
remove data that can be derived, deduced, or calculated from other fields unless needed for performance reasons.
each table must have a primary key. A primary key is defined as the minimal set of columns that contain a unique combination of values for each row in the table, allowing for the distinct identification of each record.

First normal form and repeating groups

Tabela (encja) jest w pierwszej postaci normalnej, kiedy wiersz przechowuje informacje o pojedynczym obiekcie, nie zawiera kolekcji, posiada klucz główny (kolumnę lub grupę kolumn jednoznacznie identyfikujących go w zbiorze) a dane są atomowe. Wszystko tak naprawdę zależy od definicji atomowości, jaka nas satysfakcjonuje. Jeśli atomowa informacja o adresie to dla nas nazwa ulicy z numerem to wszystko jest ok. Co innego jeśli wymogi biznesowe potrzebują rozdzielenia tych informacji – np. system zleceń dla firmy kurierskiej. Obsługa jednej ulicy może być realizowana przez kilku kurierów (np. długa ulica Piotrkowska w Łodzi) – tu z pewnością trzeba by rozdzielić te informacje na dwie kolumny

The first normal form requires each table cell to have only one value and the records to be unique

Befor normalization
After - table in 1NF

In the preceding table, it is possible to find unique records using a combination of {Name, Phone, Sport} fields.

Putting data into table form and identifying and removing repeating groups. This can be done by splitting the original table into several tables—one for the primary data and one for each repeating group. First, we remove each distinct set of repeating group columns and relocate them to a new table. It is essential to include the primary key of the core table in each new table, serving as a cross-reference. In the repeating group tables, this is referred to as a foreign key since it is the primary key of a foreign table. It is crucial to determine if the sequence of the repeated groups holds business value and to add a sequence column accordingly. obraz obraz

After defining the new tables, we assign relevant names to them and identify the new primary key for each table. Identifying this new key may be the most challenging aspect of the normalization process. If we cannot find a new set of columns uniquely identifying each record, we can create a new unique key. Following this first step, our tables are technically in the first form of normalization, also known as the first normal form; this means that all data of the same kind are held in the same place, and the repeating groups are now only limited by the maximum number of possible rows in their own table.

Second normal form

2NF mowi ze każda tabela powinna przechowywać dane dotyczące tylko konkretnej klasy obiektów. Normalizując do 2NF, wydzielić należy zbiór atrybutów (kolumn) który jest zależny tylko od klucza głównego. Wszystkie atrybuty informacyjne (nie należące do klucza), muszą zawierać informacje o elementach tej konkretnej klasy (encji, tabeli) a nie żadnej innej. Kolumny opisujące inne obiekty, powinny trafić do właściwych encji (tabel) w których te obiekty będziemy przechowywać.

In order to normalize a database into the second normal form, it should already be in 1NF and should have single-column primary key.

Identifying determinants

Upon further examination of our data, we may discover that the content of one column influences other columns in the table, meaning they are determinants of the other columns.

Determinants are not necessarily single columns; they can also be a combination of columns. For instance, the first_name and last_name columns can determine the birth_date, gender, and all personally identifiable information columns in a table. In the second step, we need to identify any determinants besides the primary key and the columns they influence. Next, establish a new table for each determinant and its associated columns. Assign a suitable name for the new table and use the determinant column as the new primary key. In the original table, eliminate the determined columns, retaining only the determinants to act as a foreign key used as a reference between tables.

obraz

Third normal form (3NF)

W 3NF kazda kolumna informacyjna nie należąca do klucza nie zależy też od innej kolumny informacyjnej, nie należącej do klucza. Czyli każdy niekluczowy argument jest bezpośrednio zależny tylko od klucza głównego a nie od innej kolumny. Np kolumny Kwota_netto,Kwota_brutto,Vat. Pole Kwota_brutto zalezy od Kwota_netto i Vat wiec moze zostac usunieta bez utraty informacji. Czesta łamie sie swiadomie 3NF w przypadku czasochlonych wyliczen zeby zaoszczedzic czas.

For a database to be normalized into the third normal form, it should already be in 2NF and should have no transitive functional dependencies.

Many texts recommend beginning by identifying the determinants that are part of the table’s primary key. Once these determinants are moved to new tables, the table reaches its second normal form (2NF). However, the 2NF is merely a stage in the process, as we also need to relocate all determinants that are not part of the primary key. After addressing all determinants, our tables will be in their 3NF.

In more complex scenarios, where every column in the table is part of the primary key or where tables contain time-dependent data, other normalization forms are used. However, the 3NF is usually enough to achieve a robust data model.

#NF zapewnia:

Kazda kolumna jest atomowa (przechowuje pojedyncza wartosc)
Kazda kolumna zalezy tylko od klucza glownego
Zadna kolumna nie zalezy atrybotow nie bedacych kluczam glownym

One-to-One Relationships

A one-to-one relationship means that each record in one table corresponds to exactly one record in another table. This type of relationship is less common but incredibly useful for separating data that doesn’t need to be accessed together frequently. When to Use It

Storing sensitive information separately (e.g., user credentials).
Splitting rarely accessed data into its own table to improve performance.

Example application where users have detailed proﬁles:

CREATE TABLE User (
id SERIAL PRIMARY KEY,
name VARCHARU E10) NUT NULL,
email VAF.CHAR(2 5 5) UNIQUE NOT NULL
);
CREATE TABLE Proﬁle (
id SERIAL PRIMARY KEY,
bio TEXT,
avatar_url VARCHAR(255),
set_id INT UNIQUE REFERENCES User(id) ON DELETE CASCADE
);

In this example:

The User table contains core user data like name and email.
The Proﬁle table contains optional details like a bio and avatar_url.
The user_id column in the Proﬁle table ensures each proﬁle is linked to exactly one user.

One-to-Many Relationships

This is the most common type ofrelationship. One record in the parent table corresponds to mul- tiple records in the child table. Think ofa blog where one user can write multiple posts. When to Use It

Modeling hierarchical data (e.g., categories and products).
Tracking historical events (e.g., orders and order items). Code Example: Users and Posts Here’s how you might model a blogging platform:

CREATE TABLE User (
id SERIAL PRIMARY KEY,
name YARCHAR(1UU) NOT NULL,
email VARCHAR(25 5) UNIQUE NOT NULL
);

CREATE TABLE Post (
id SERIAL PRIMARY KEY,
title VARCHARQOU) NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
user_id INT REFERENCES User(id) UN DELETE CASCADE
);

Key details:

Each post is linked to a user through the user_id foreign key.
The ON DELETE CASCADE ensures that if a user is deleted, their posts are automatically removed.

For better query performance, consider adding an index on the user_id column in the Post table.

CREATE TABLE Student (
id SERIAL PRIMARY KEY,
name VARCHAR(1UD) NOT NULL
);
CREATE TABLE Course (
id SERIAL PRIMARY KEY,
title VARCHARQDU) NOT NULL
);
CREATE TABLE Enrollment (s11L1de11t_id INT REFERENCES Student(id) ON DELETE CASCADE,
course_id INT REFERENCES Course(id) ON DELETE CASCADE,
enrolled_at TIMESTAMP DEFAULT NOT NULL,
PRIMARY KEY (student_id, course_id)
);

Details to note:

The Enrollment table serves as the join table, linking Student and CULITSE.
The composite primary key ( student_id , course_id ) ensures each student can enroll in a course only once.
The enrolled_at column tracks when the relationship was created, adding valuable contest.