Grokking SDI Intro - ayaohsu/Personal-Resources GitHub Wiki

We should focus less on mechanics and more on trade-offs.

SDI Strategy

Ask refining questions -> Handle the data -> Discuss the components -> Discuss trade-offs
  1. Refining questions:
  • Functional requirements: what clients need directly
  • Non-functional requirements: what clients need indirectly
  1. Handle the data:
  • What's the size of the data right now & at what rate is the data expected to grow?
  • How will the data be consumed?
  • Is the data read-heavy or write-heavy?
  • Strict consistency vs eventual consistency
  • Durability target of the data
  1. Components
  • Front-end components, load balancers, caches, databases, firewalls, CDNs
  1. Trade-offs
  • Cost
  • Performance
  • Technical complexity

Consistency

In distributed system, consistency means that each replica node has the same view of data at a given point of time.

Weakest <----------> Strongest

Eventual Consistency <-> Causal Consistency <-> Sequential Consistency <-> Strict Consistency

Strict consistency guarantees that a read request to any replicas will get the latest write value.

Non-functional System Characteristics

  • Availability
    • (Total time - time service was down) / Total time
  • Reliability
    • Probability that the service will perform its functions for a specific time
    • mean time between failure = (total time - sum of downtime) / total number of failures
    • mean time to repair = total maintenance time / total number of repairs
  • Scalability
    • The ability of a system to handle an increasing amount of workload without compromising performance
    • (data-intensive book) There should be a reasonable way to handle the growth
    • Workload: request workload or storage workload
  • Maintainability
    • Many different people should be able to work on the app productively
      • Operability: Monitor the health of the system. Tracking down the cause of problems
      • Simplicity: "Is the system complex?"
      • Evolvability: Is it easy to add a new feature

Back-of-the-envelope Calculations

Web Server: RAM Low | Processor High | Hard Drive Low |
Application Server: RAM High | Processor Medium | Hard Drive Medium |
Storage Server: RAM Low | Processor Medium | Hard Drive High |

Numbers to remember

# Latency
L1 - L3 cache: 1 - 10 ns  
Main memory reference: 100 ns
Read 1 MB sequentially from memory: 9 μs
Read 1 MB sequentially from SSD: 200 μs
Read 1 MB sequentially from disk: 2 ms

# Throughput  
QPS handled by MySQL: 1000   
QPS handled by key-value store: 10000  
⚠️ **GitHub.com Fallback** ⚠️