timeline service - jamongx/twitter-clone GitHub Wiki

This service will handle the generation and publishing of user newsfeeds. It will be discussed in detail separately.

Database

NoSQL (Cassandra)
To handle the generation and publishing of user newsfeeds, a combination of a caching layer and a database can be effective.

Caching

you can use solutions like Redis or Memcached, which provide high-speed data access.
As for the database, a NoSQL database like MongoDB or Cassandra can be suitable for storing and retrieving the newsfeed data.

Table

UserTimeline: user_id (FK), tweet_id (FK), timestamp
UserTimeline: user_id (PK), tweet_id (composite key), timestamp
HomeTimeline: user_id (FK), tweet_id (FK), timestamp
HomeTimeline: user_id (PK), tweet_id (composite key), timestamp

feeds

This table stores feed properties with the corresponding userID.

feeds tweets

This table maps tweets and feed (N:M relationship).

When it comes to the newsfeed, it seems easy enough to implement, but there are a lot of things that can make or break this feature. So, let's divide our problem into two parts:

Generation

Let's assume we want to generate the feed for user A, we will perform the following steps:

Retrieve the IDs of all the users and entities (hashtags, topics, etc.) user A follows.
Fetch the relevant tweets for each of the retrieved IDs.
Use a ranking algorithm to rank the tweets based on parameters such as relevance, time, engagement, etc.
Return the ranked tweets data to the client in a paginated manner.

Feed generation is an intensive process and can take quite a lot of time, especially for users following a lot of people. To improve the performance, the feed can be pre-generated and stored in the cache, then we can have a mechanism to periodically update the feed and apply our ranking algorithm to the new tweets.

Publishing

Publishing is the step where the feed data is pushed according to each specific user. This can be a quite heavy operation, as a user may have millions of friends or followers. To deal with this, we have three different approaches:

Pull Model (or Fan-out on load)

When a user creates a tweet, and a follower reloads their newsfeed, the feed is created and stored in memory. The most recent feed is only loaded when the user requests it. This approach reduces the number of write operations on our database.

The downside of this approach is that the users will not be able to view recent feeds unless they "pull" the data from the server, which will increase the number of read operations on the server.

Push Model (or Fan-out on write)

In this model, once a user creates a tweet, it is "pushed" to all the follower's feeds immediately. This prevents the system from having to go through a user's entire followers list to check for updates.

However, the downside of this approach is that it would increase the number of write operations on the database.

Hybrid Model

A third approach is a hybrid model between the pull and push model. It combines the beneficial features of the above two models and tries to provide a balanced approach between the two.

The hybrid model allows only users with a lesser number of followers to use the push model. For users with a higher number of followers such as celebrities, the pull model is used.

Get newsfeed

This API will return all the tweets to be shown within a given newsfeed.

getNewsfeed(userID: UUID): Tweet[]

Parameters

User ID (UUID): ID of the user.

Returns

Tweets (Tweet []): All the tweets to be shown within a given newsfeed.