Database Design | Expertifie - sulabh84/SystemDesign GitHub Wiki

  • Identify Entities
  • Identify Relationships
  • Remove Redundant Relationships
  • For Many to Many Relationships, create an association entity
  • Normalization of data
    • it is a process of organizing the data in database
    • it is used to reduce redundancy. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion anomalies
    • it divides the larger table into the smaller table and links them using relationships
    • The normal form is used to reduce redundancy from the database table

Normal Forms

  • 1st NF - if a relation contains a composite or multi-valued attribute, it violates first normal form. A relation is in first normal form if every attribute in that relation is single valued attribute. E.g. One person has multiple phone numbers
  • 2nd NF - relation must be in first normal form and relation must not contain any partial relationship. A relation is in 2NF if it has No Partial Dependency. Non Prime attribute should be dependent on Prime attribute. E.g. person to location to Pin code (location to pincode 1-1 relationship)
  • 3rd NF - it should be 2NF and there is no transitive dependency for non-prime attributes. E.g. person to state to country (state to country 1-M relationship)

Indexes

  • Storing data by sorting in a particular format to make your read query efficient. write will be slower with indexes
  • Clustered - sorting is performed on the table. By default there can be only one clustered index
  • Non-Clustered indexes - index is sorted but data might not be present in the sorted order. indexes are sorted separately

ACID

  • Atomicity
    • no partial update in a transaction.
    • All or None
  • Consistency
    • Both DB and application layer are responsible for maintaining consistency.
    • DB can enforce invariants. If data is deleted from primary table then foreign table should also delete it for consistency
  • Isolation
    • Multiple transactions occur concurrently without leading to inconsistency in data
    • Change occurring in other transaction would not be visible to another transaction until committed
  • Durability
    • persist when the transaction commits
   Problem with two transactions without locks:
       Transaction T1                   Transaction T2
          Read(X)
        x=x+1 -> 43
          Write(X)
                                           Read(X)
                                          x=x+1 -> 43
         Commit(X)                         Write(X)
                                           Commit(X)

Transaction Locks

  • At any given point there can be multiple read locks or there can only one write lock
  • Deadlock
  ReadT1 -> X
  ReadT2 -> X
  WriteT1 -> Fail (Unless T2 leaves Read Lock, T1 Can't take write lock)
  WriteT2 -> Fail (Unless T1 leaves Read lock, T2 Can't take write lock)
  • Solution -> Aging Mechanism
    • ReadT1
    • ReadT2
    • WriteT1 -> Force T2 to leave the lock
    • ReadT2 is gone
    • WriteT1
    • CommitT1
    • ReadT2
    • WriteT2
  • To maintain Isolation / Consistency / Atomicity and durability
    • Avoid Dirty Read
      • Read the data only when it is committed
    • Avoid Dirty Write
      • Overwrite the data which has be committed
      • Solution
        • Serial Execution
          • One thread at a time can read/write data for a particular row
          • Bad solution - Not scalable
        • Read-Write lock mechanism
          • Read Lock (Shared) - Multiple Read Locks on a row
          • Write Lock (Mutex) - Only one write on a row
            • Deadlock - Preferences based on aging and other parameters
            • Overhead to maintain these details
        • Optimistic Locking
          • at the time of commit, transaction will check if the read data is same at the time of commit if yes then commit otherwise rollback the transaction
          • Check when transaction Commits
          • No need to take unnecessary locks on a row
          • Helps to detects stale reads

Questions

  • what is two phase commit
  • Service level aggregation
  • Change data capture methodology