Twitter HLD | Expertifie - sulabh84/SystemDesign GitHub Wiki

Functional Requirement

Create Profile - User
Update Profile - User
Login into the account
Post a tweet
React on a tweet (Like, Comment)
Follow other users
Fetch the latest tweets for the user - Feed generation

Non-Functional Requirement

Low latency - within a second
Availability - High
Consistency - Eventual Consistency should be fine
- More preference should be given to Availability over consistency
Reliability
- DDoS (Distributed Denial of Services)
  - Block the IP from which you have received more than X no of requests within a time window
- System should be highly available
- Right checks for authentication and authorization in the system
- Data transfer should be secure - Request and response should be encrypted
- Backup to retrieve data for disaster recovery
Estimations
- Assumptions
  - 500M users in an year
  - 100M daily active users
  - 50k new users signing up daily bases
  - On an avg. 50% of the active user will create 1 tweet
  - every active user gives 10 reactions
  - every user will follow another user on an avg in a day
  - on an avg a user will check for the latest tweets 5 times
- QPS - for a second
  - Read QPS
    - 10M (User Login)
    - 100M*5 (Fetch latest Tweets)
      - 500 * 10^6 / 606024 = 500 * 10^6 / 10^5
      - 5000 QPS for reading
  - Write QPS (in a day)
    - 50k (Create profile)
    - 1M (Update Profile)
    - 50M (Create Tweets)
    - 100M*10 (Reactions)
    - 100M (Follow)
    - ~1200M writes request per day
      - 1200 * 10^6 / 606024 = 1200 * 10^6 / 10^5
      - 12000 QPS for writing
    - Load might not be evenly distributed across the day, so it might be possible that you see more spike in few hours than what you expect from above calculation
    - Multiplier factor -> 1.5
    - Read = 5000 * 1.5 = 7500 QPS
    - Write = 12000 * 1.5 = 18000 QPS
- Capacity - at least an year
  - (500M + 50K * 365) * 1000 -> Users
    - (500M + 400 * 50K)*1000
    - 520B bytes
  - 50M * 365 * 500Bytes -> tweets
    - 50M * 400 * 500
    - 100 * 10^5M
    - 10000B Bytes
  - 100M * 10 * 365 * 100 -> Reaction
    - 100 * 10 * 400 * 100M
    - 40000B bytes
  - 100M * 365 * 100 -> Follow
    - 100 * 400 * 100M
    - 4000B Bytes
  - ~55000B Bytes total = 55TB total in a year
  - Note: We have not considered replications here.

Detailed Design

APIs
- CreateProfile() returns success/failure
- UserLogin(string username, string password)
- UpdateProfile(List, List)
- CreateTweet(UserId,String TweetContent)
- ReactOnTweet(Userid, tweetId, ReactionType, Content)
- FetchLatestTweets(Userid)
- FollowUser(UserId -> Follower, Userid -> Followee)
Tables
- User Table (1000 bytes)
  - UserId(PK), Password (Encrypted), Name, DoB, Phone Number, Email Address, Profile Picture, CreationTimeStamp
- Tweet Table (500 bytes)
  - TweetId(PK), UserId -> Author of the tweet, TweetContent, CreationTimeStamp
- Reaction Table (100 Bytes)
  - ReactionId(PK), TweetId, UserId, ReactionType (Like, Comment), Reaction Details -> only populated for comments, CreationTimeStamp
- FollowUser Table (100 bytes)
  - UserId -> Follower, UserId -> Followee, TimeStamp

Sharding

Horizontal sharding
- User Table
  - Region based shard - 1st level sharding
    - Hashing based on the userid - 2nd level sharding
- Tweet Table
  - Region based shard
    - Hashing based on the userid
- Reaction Table
  - Region based shard
    - Hashing based on the tweetid
- FollowUser Table
  - Region based shard
    - Hashing based on the userid

Replications

Master Slave configuration
- Data will be written into master and read from the slave
- MultiMaster (Master -> Master -> Master) have slaves under them

Load balancing

Helps in better distribution of load across layers
Intelligent Round Robin Strategy for balancing the load between the server/machine

Auths (Authentication and Authorization)

User is able to login -> authentication
To verify if the userId for which the request is made belongs to the same user, check if the tokens belong to the same user or not
Authorization -> if the user can access particular APIs/Service with their credentials

Caching

80-20 Rule

80% of the request on the system will query for 20% of the data and remaining 20% of the request will be querying on 80% of the data
based on number of followers we should decide what data should be cached
User Cache -> Cache profiles of celebrities and famous personalities
Tweet cache -> Store the data for recent tweets from famous personalities
Eviction Strategy -> LRU(Least Recently used)
use write around cache -> to write data into the cache

Monitoring, Alerting, Dashboards, Backups

Helps monitor the latency of each of the APIs
Monitor success and failure rates of the APIs
QPS for each of the APIs
Monitoring of caches
Alerting:
- Alert on failure if requests fails X times in a window of a time (1 hour, 10 mins, 20 mins)
- Alerting on the latency -> if X% of the requests from total requests within a window is taking more than t ms to return the result
BackUps
- Retrieval of data which could be lost because of some bugs
- Product Metrics on backups
  - Create metrics for experiments on backups
- Analytics

Twitter HLD | Expertifie - sulabh84/SystemDesign GitHub Wiki

Functional Requirement

Non-Functional Requirement

Detailed Design

Sharding

Replications

Load balancing

Auths (Authentication and Authorization)

Caching

Monitoring, Alerting, Dashboards, Backups

HLD diagram

⚠️ GitHub.com Fallback ⚠️

Twitter HLD | Expertifie - sulabh84/SystemDesign GitHub Wiki

Functional Requirement

Non-Functional Requirement

Detailed Design

Sharding

Replications

Load balancing

Auths (Authentication and Authorization)

Caching

Monitoring, Alerting, Dashboards, Backups

HLD diagram

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️