Google Typehead System Design | Expertifie - sulabh84/SystemDesign GitHub Wiki

Search Bar -> User is typing few characters, you should be able to return top 10 suggestions to the user for the typed char.
Suggestion to be returned should have the maximum frequency in your table

Latency -> low approx 10ms
Availability -> High Availability (5 9's)
Consistency -> Not required immediately but the system should be eventually consistent
Reliability
- System should be reliable in the sense that it doesn't capture user specific information
- Best suggestions are returned from the system
- Security layers are in-place
- Rate limiter

Assumptions
- Approx 100M word search per day
- 0.001% of the words being searched are new into the system
- Avg size of the word is 7-8 characters
- 500M words already present in the dictionary
QPS
- Read QPS
  - 3 (char user has to type to make a suggestion)*100 * 10^6 (Per day) / 24 * 60 * 60 (86400)
  - 3*10^8/10^5
  - 310^3 = 31000 read QPS * 2 (Multiplier) = 6000
- Write QPS
  - Read to write ratio is 3:1
  - Write QPS -> 2000 - 3000 write per second
Capacity
- 100M search per day
- For an Entry
  - Word
  - Freq
- 500M (Current Disc.)
  - 500M * (7 size on avg of each word) * (1 char is 1 byte) * 100 bytes (metadata) -> words already present
  - Every day -> 500M * 0.001 -> 50K new words everyday -> 50K * 7 * 1 * 100(metadata) -> 35K * 10^3 = 35M per day new words
  - for next 1 year
    - 500M7100 + 35M*365 bytes
    - 35M * 10^4 + ~ (1M * 10^4)
    - 35 * 10^6 * 10^4 / (102410241024)
    - 360 GB of data per year = ~400GB

80% request 20% data and 20% request 80% data
Cache those 20% prefixes
- Trie -> cache Optimize on read and write
  - Invalidations
  - Strategy to remove a prefix from the cache -> LRU
- Write Around cache