Database Design | Expertifie - sulabh84/SystemDesign GitHub Wiki

Identify Entities
Identify Relationships
Remove Redundant Relationships
For Many to Many Relationships, create an association entity
Normalization of data
- it is a process of organizing the data in database
- it is used to reduce redundancy. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion anomalies
- it divides the larger table into the smaller table and links them using relationships
- The normal form is used to reduce redundancy from the database table

Normal Forms

1st NF - if a relation contains a composite or multi-valued attribute, it violates first normal form. A relation is in first normal form if every attribute in that relation is single valued attribute. E.g. One person has multiple phone numbers
2nd NF - relation must be in first normal form and relation must not contain any partial relationship. A relation is in 2NF if it has No Partial Dependency. Non Prime attribute should be dependent on Prime attribute. E.g. person to location to Pin code (location to pincode 1-1 relationship)
3rd NF - it should be 2NF and there is no transitive dependency for non-prime attributes. E.g. person to state to country (state to country 1-M relationship)

Indexes

Storing data by sorting in a particular format to make your read query efficient. write will be slower with indexes
Clustered - sorting is performed on the table. By default there can be only one clustered index
Non-Clustered indexes - index is sorted but data might not be present in the sorted order. indexes are sorted separately

ACID

Atomicity
- no partial update in a transaction.
- All or None
Consistency
- Both DB and application layer are responsible for maintaining consistency.
- DB can enforce invariants. If data is deleted from primary table then foreign table should also delete it for consistency
Isolation
- Multiple transactions occur concurrently without leading to inconsistency in data
- Change occurring in other transaction would not be visible to another transaction until committed
Durability
- persist when the transaction commits

   Problem with two transactions without locks:
       Transaction T1                   Transaction T2
          Read(X)
        x=x+1 -> 43
          Write(X)
                                           Read(X)
                                          x=x+1 -> 43
         Commit(X)                         Write(X)
                                           Commit(X)

Transaction Locks

At any given point there can be multiple read locks or there can only one write lock
Deadlock

  ReadT1 -> X
  ReadT2 -> X
  WriteT1 -> Fail (Unless T2 leaves Read Lock, T1 Can't take write lock)
  WriteT2 -> Fail (Unless T1 leaves Read lock, T2 Can't take write lock)

Solution -> Aging Mechanism
- ReadT1
- ReadT2
- WriteT1 -> Force T2 to leave the lock
- ReadT2 is gone
- WriteT1
- CommitT1
- ReadT2
- WriteT2

To maintain Isolation / Consistency / Atomicity and durability
- Avoid Dirty Read
  - Read the data only when it is committed
- Avoid Dirty Write
  - Overwrite the data which has be committed
  - Solution
    - Serial Execution
      - One thread at a time can read/write data for a particular row
      - Bad solution - Not scalable
    - Read-Write lock mechanism
      - Read Lock (Shared) - Multiple Read Locks on a row
      - Write Lock (Mutex) - Only one write on a row
        
        Deadlock - Preferences based on aging and other parameters
        
        Overhead to maintain these details
    - Optimistic Locking
      - at the time of commit, transaction will check if the read data is same at the time of commit if yes then commit otherwise rollback the transaction
      - Check when transaction Commits
      - No need to take unnecessary locks on a row
      - Helps to detects stale reads

Questions

what is two phase commit
Service level aggregation
Change data capture methodology