2: BACK‐OF‐THE‐ENVELOPE ESTIMATION - swchen1234/systemDesign GitHub Wiki

In a system design interview, sometimes you are asked to estimate system capacity or performance requirements using a back-of-the-envelope estimation. The following concepts should be well understood: power of two [2], latency numbers every programmer should know, and availability numbers.

Power of two

latency numbers

观察

  • Memory is fast but the disk is slow.
  • Avoid disk seeks if possible.
  • Simple compression algorithms are fast.
  • Compress data before sending it over the internet if possible.
  • Data centers are usually in different regions, and it takes time to send data between them.

availability numbers

High availability 代表系统能够持续运作的能力,100%代表该系统has 0 downtown.大多数系统在99-100%之间。

  • A service level agreement (SLA) is a commonly used term for service providers. This is an agreement between you (the service provider) and your customer, and this agreement formally defines the level of uptime your service will deliver.
  • Cloud providers Amazon [4], Google [5] and Microsoft [6] set their SLAs at 99.9% or above

An Example:Twitter Estimation

Assumptions:

  • 300 million monthly active users.
  • 50% of users use Twitter daily.
  • Users post 2 tweets per day on average.
  • 10% of tweets contain media.
  • Data is stored for 5 years.

Estimations:
Query per second (QPS) estimate:

  • Daily active users (DAU) = 300 million * 50% = 150 million
  • Tweets QPS = 150 million * 2 tweets / 24 hour / 3600 seconds = ~3500
  • PeekQPS=2*QPS=~7000

We will only estimate media storage here.

  • Average tweet size:
    • tweet_id 64 bytes
    • text 140 bytes
    • media 1MB
  • Mediastorage:150million210%*1MB=30TBperday
  • 5-yearmediastorage:30TB3655=~55PB

Commonly asked back-of-the-envelope estimations: QPS, peak QPS, storage, cache, number of servers, etc. You can practice these calculations when preparing for an interview. Practice makes perfect.