Rate Limiter - rFronteddu/general_wiki GitHub Wiki

Rate Limiter

System requirements

Functional Requirements

Limit number of requests an entity can send to an API within a time window (e.g. 15 requests per second)
APIs accessible through a cluster, rate limit should be considered across different servers.
Users should get an error message whenever the defined threshold is crossed within a single or across a a combination of servers.

Non functional

System highly available. Rate limiter must always work since it protects our service from external attacks.
Rate limiter should not introduce substantial latencies that affect user experience.

How to do rate limiting

Rate limiting is a process that is used to define the rate and speed at which consumers can access APIs.
Throttling is the process of controlling the usage of the APIs by customers during a given period. When a throttle limit is crossed, the server should return HTTP status 429 - too many requests.

Different types of throttling

Hard throttling: Number of requests cannot exceed throttle limit
Soft: We can set the limit to be around a threshold
Elastic/Dynamic: Requests can go over if there are resources

Algorithms for rate limiting

Fixed Window: limit in a fixed period (for example, every 0-60 seconds reset the counters regardless of when the call is done).
Rolling window: Time window is considered from the fraction of the time at which the request is made plus the window length.

High level design

The rate limiter will decide which requests will be served by the API server and which will be declined. Once a new request arrives, the Web Server first asks the Rate Limiter if it will be served or throttled.

RL high level

Basic System Design and Algorithm

For each unique user, we need to keep a count representing how many requests the user has made and a timestamp indicating when we started counting the requests. We can keep it in a hash table, where the key can be the UserID and the value can be a structure containing an integer for the Count and an Integer for the Epoch time.

Upon receiving a request, if there is no UserID we add if there is, we check if current - start time is larger than the window, if it is, we set the value to 1 and allow the request, otherwise we increment the count counter, if we are over the rate limit threshold we reject the request.

Some problems

This fixed window algorithm can be abused potentially to send double requests by sending 6 consecutive requests, three at the end of the interval before the refresh and 3 right after. A sliding window fixes this problem.
Read and write behavior can create race conditions. A client could send multiple requests trying to exploit update times. We need to lock the counter for each transaction.

How much memory

Assume simple solution, all data in a hash-table.
UserID = 8 bytes
Count = 2 bytes