Relational Database Design - yibinericxia/documents GitHub Wiki
Relational Database
Data Modeling & Schema/Table Design
How to model the real world and map UX design logically and smoothly with data entities and relations/constraints requires experiences. Do not pile up different data types which should be constrained differently into a single table just for easy lookup, and implement those constraint checks in applications. Analyze and normalize the data representation and relationship first (especially many-to-many) using ER diagram, and evaluate efforts on how to control/manipulate data (also with SQL script for updating, importing and exporting). Data integration constraint rules should stay inside database, not applications.
Please remember not to use the model of EAV (Entity-Attribute-Value) below for quick solution as no constraints can be applied to those values since the its data type is forced into "varchar". Since the EAV model often put data and their metadata together, complex SQL queries are often needed to retrieve meaningful resources.
| id | obj_id | attribute | value |
|---|---|---|---|
| 1 | 101 | action | create |
| 2 | 101 | isLegal | true |
| 3 | 101 | action_date | 2015-01-01 |
| 4 | 701 | [email protected] | |
| 5 | 701 | first name | Joe |
However, the following table using the flat model is not EAV
| id | employee_id | year | month | leave_days |
|---|---|---|---|---|
| 1 | 101 | 2015 | 1 | 0 |
| 2 | 101 | 2015 | 2 | 0.5 |
| 3 | 102 | 2015 | 1 | 2 |
| 4 | 102 | 2015 | 2 | 1 |
| 5 | 102 | 2015 | 3 | 0 |
and it has advantages over the table with splitting attributes below, even though we can use "union" or "unpivot" to transform data into one of the above
| id | employee_id | year | jan | feb | march | april ... |
|---|---|---|---|---|---|---|
| 1 | 101 | 2015 | 0 | 0.5 | ||
| 2 | 102 | 2015 | 2 | 1 | 0 |
Company data in database and its lifecycle are usually much longer than of applications. Data compliance and security need to be considered besides its usage in client applications. SQL server physical limitations and SQL query performance may also play an important role in table design.
The sensitive data can be decoupled from the rest in a way by being put in a separate table, thus its primary key can be referred in other tables or bridge tables. In this way, we can replace the sensitive data in its own table without affecting other tables for testing or demo. See database security for more data security solutions.
Primary ID
The common way to use numerical sequence as primary id which is managed by database. For a clustered database and other horizontal scale sequence generator, coordination among all the nodes are required and that may impact the performance.
UUID becomes popular due to its global uniqueness. It has 5 segments and its size is 4 times larger than a sequence. Hibernate 4 and JPA 3.1 above provides UUID generation strategies, such as @GeneratedValue(strategy = GenerationType.UUID) in JPA, @GenericGenerator(name = "UUID", strategy = "org.hibernate.id.UUIDGenerator") or @UuidGenerator(style = Style.RANDOM/AUTO/TIME) which is based on IP & timestamp in Hibernate.
Timestamp
Consideration:
- the application uses different database in different environment?
- multiple applications use the single database?
Database
@COLUMN(name = "created_date", columnDefinition = "TIMESTAMP DEFAULT CURRENT_TIMESTAMP)"
private Timestamp createdDate;
Use Hibernate with the VM timestamp like below:
@CreationTimestamp
@Temporal(TemporalType.TIMESTAMP)
@COLUMN(name = "created_date", nullable = false)
private Date createdDate;
@UpdateTimestamp
@Temporal(TemporalType.TIMESTAMP)
@COLUMN(name = "updated_date")
private Date updatedDate;
TypeScript
export interface MyEntity {
CREATED_DATE: Date,
UPDATED_DATE: Date,
...
}
View and Materialized View
View is a virtual copy of a table created by executing a SELECT query statement on columns of one or more tables, thus is always up-to-date. It is normally used when data has to be accessed infrequently, but data gets updated frequently.
Materialized view is the content from a SELECT query like the normal view, but stored as a physical table, thus data access is faster than the normal view. It is used when data is to be accessed frequently and data in table not get updated on frequent basis. A good example is the historical sales data snapshot with computation and data type conversion for quick reporting and analysis.
Views help to write clearer, more modular SQL code with proper access privileges for security with column/row level control. Views are also helpful if multiple applications share the same database so that all the apps can get the coherent data from views which encapsulates the complex logic for consistency. Views can also help in maintenance and migration when underlying database schema change or optimization is needed, or version and compatibility needs to be addressed.
Using more intuitive names for the view columns and consolidating/calculating/masking the proper data to different audience can be achieved via views. Views can also be set to log access for auditing and monitoring.
Database Integrity
Integrity Constraints
Some constraints, such as the primary key constraint, unique constraint, referential constraint, have its corresponding indexes created automatically by database system to enforce the constraints, so do not create index for those constraints. For the id column which should be the primary id with the missing primary key constraint, do not created the unique index to fix the issue. Instead add the primary key constraint so that that constraint gets dropped and its unique index gets dropped automatically.
The following integrity constraints types need considerations:
- Entity Integrity
PK for table cannot be null and should be unique. If sequence is used, its cache size should also need consideration. Please see the above section on Primary ID
Any unique constraint on columns will have its unique index created automatically by the database server.
- Referential Integrity
FK must have a matching PK or it must be null (orphan row) if accepted. PK & FK have the same data type and are from the same domain.
- Enterprise Integrity
The enterprise or semantic constraints are specific rules, such as the salary of an employee cannot exceed his/her manager's salary.