Database Design | Expertifie - sulabh84/SystemDesign GitHub Wiki
- Identify Entities
- Identify Relationships
- Remove Redundant Relationships
- For Many to Many Relationships, create an association entity
- Normalization of data
- it is a process of organizing the data in database
- it is used to reduce redundancy. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion anomalies
- it divides the larger table into the smaller table and links them using relationships
- The normal form is used to reduce redundancy from the database table
Normal Forms
- 1st NF - if a relation contains a composite or multi-valued attribute, it violates first normal form. A relation is in first normal form if every attribute in that relation is single valued attribute. E.g. One person has multiple phone numbers
- 2nd NF - relation must be in first normal form and relation must not contain any partial relationship. A relation is in 2NF if it has No Partial Dependency. Non Prime attribute should be dependent on Prime attribute. E.g. person to location to Pin code (location to pincode 1-1 relationship)
- 3rd NF - it should be 2NF and there is no transitive dependency for non-prime attributes. E.g. person to state to country (state to country 1-M relationship)
Indexes
- Storing data by sorting in a particular format to make your read query efficient. write will be slower with indexes
- Clustered - sorting is performed on the table. By default there can be only one clustered index
- Non-Clustered indexes - index is sorted but data might not be present in the sorted order. indexes are sorted separately
ACID
- Atomicity
- no partial update in a transaction.
- All or None
- Consistency
- Both DB and application layer are responsible for maintaining consistency.
- DB can enforce invariants. If data is deleted from primary table then foreign table should also delete it for consistency
- Isolation
- Multiple transactions occur concurrently without leading to inconsistency in data
- Change occurring in other transaction would not be visible to another transaction until committed
- Durability
- persist when the transaction commits
Problem with two transactions without locks:
Transaction T1 Transaction T2
Read(X)
x=x+1 -> 43
Write(X)
Read(X)
x=x+1 -> 43
Commit(X) Write(X)
Commit(X)
Transaction Locks
- At any given point there can be multiple read locks or there can only one write lock
- Deadlock
ReadT1 -> X
ReadT2 -> X
WriteT1 -> Fail (Unless T2 leaves Read Lock, T1 Can't take write lock)
WriteT2 -> Fail (Unless T1 leaves Read lock, T2 Can't take write lock)
- Solution -> Aging Mechanism
- ReadT1
- ReadT2
- WriteT1 -> Force T2 to leave the lock
- ReadT2 is gone
- WriteT1
- CommitT1
- ReadT2
- WriteT2
- To maintain Isolation / Consistency / Atomicity and durability
- Avoid Dirty Read
- Read the data only when it is committed
- Avoid Dirty Write
- Overwrite the data which has be committed
- Solution
- Serial Execution
- One thread at a time can read/write data for a particular row
- Bad solution - Not scalable
- Read-Write lock mechanism
- Read Lock (Shared) - Multiple Read Locks on a row
- Write Lock (Mutex) - Only one write on a row
- Deadlock - Preferences based on aging and other parameters
- Overhead to maintain these details
- Optimistic Locking
- at the time of commit, transaction will check if the read data is same at the time of commit if yes then commit otherwise rollback the transaction
- Check when transaction Commits
- No need to take unnecessary locks on a row
- Helps to detects stale reads
Questions
- what is two phase commit
- Service level aggregation
- Change data capture methodology