4. DESIGN A RATE LIMITER - swchen1234/systemDesign GitHub Wiki

In a network system, a rate limiter is used to control the rate of traffic sent by a client or a service. In the HTTP world, a rate limiter limits the number of client requests allowed to be sent over a specified period.

好处

防止由Denial of Service(DOS)导致的resource starvation. DOS由用户访问过于频繁导致。
减少开支。Limiting excess requests means fewer servers and allocating more resources to high priority APIs。
防止server overloaded.

Step 1 - Understand the problem and establish design scope

提问环节问题：

client-side rate limiter or server-side API rate limiter?
Does the rate limiter throttle API requests based on IP, the user ID, or other properties?
scale?
distributed environment?

Step 2 - Propose high-level design and get buy-in

Where to put the Rate Limiter?

Intuitively, you can implement a rate limiter at either the client or server-side. client-side限制更多，所以建议放在server-side, 或者放在中间（middleware, 如下图), throttle client request to API.

Algorithms for rate limiting

Token bucket

每来一个user request,从bucket中取走一个token,若bucket中没有token剩余，则drop request；bucket每隔一定时间refill。
优点：容易, memory efficient
缺点：两个变量需要决定：bucket size和refill rate，选择合适的不容易。

Leaking bucket

用FIFO的queue实现，若bucket已满，则drop request

优点：
- Memory efficient given the limited queue size.
- Requests are processed at a fixed rate therefore it is suitable for use cases that a stable outflow rate is needed.
缺点：
- 当有大量request时会占满queue,导致之后的任务都被drop
- 两个变量需要决定：bucket size和outflow rate，选择合适的不容易。

Fixed window counter algorithm

divides the timeline into fix-sized time windows and assign a counter for each window.
每次任务+1, 一旦counter超过threshold,则drop other request，until the next window.

Sliding window log algorithm

Sliding window counter algorithm

A hybrid approach that combines the fixed window counter and sliding window log. Requests in current window + requests in the previous window * overlap percentage of the rolling window and previous window.

优点：

It smooths out spikes in traffic because the rate is based on the average rate of the previous window.
Memory efficient. 缺点： It only works for not-so-strict look back window. It is an approximation of the actual rate because it assumes requests in the previous window are evenly distributed. However, this problem may not be as bad as it seems.

High-level architecture

At the high-level, 需要 a counter 来记录 how many requests are sent from the same user, IP address, etc. If the counter is larger than the limit, the request is disallowed.
counter存在 in-memory cache. 因为其fast and supports time-based expiration strategy. Redis 被广泛使用。

Step 3 - Design deep dive

Rate limiting rules

Rules are generally written in configuration files and saved on disk.

Exceeding the rate limit

用户怎么知道自己是否被throttle?

发送request后，rate limiter会返回一个header，里面会有这些信息。
如果用户被限制了，会收到response code 429 (too many requests)以及一个X-Ratelimit-Retry-After heade(The number of seconds to wait until you can make a request again without being throttle).

Rate limiter in a distributed environment

Race condition

不同router同时发送request,都以为counter+1（实则+2).
解决方法：Lua script [13] and sorted sets data structure in Redis. locks可以解决，但太慢。

Synchronization issue

若有多个rate limiter，每个不含所有人的最新信息。
解决方法：

sticky sessions（allow a client to send traffic to the same rate limiter）, not Recommended, neither scalable nor flexible.
centralized data stores like Redis.

Performance optimization

设置 multi-data center setup，因为latency is high for users located far away from the data center.
Synchronize data with an eventual consistency model.

Monitoring

4. Wrap Up

Additional points:

Hard vs soft rate limiting.
- Hard: The number of requests cannot exceed the threshold.
- Soft: Requests can exceed the threshold for a short period.
Rate limiting at different levels. In this chapter, we only talked about rate limiting at the application level (HTTP: layer 7). It is possible to apply rate limiting at other layers. For example, you can apply rate limiting by IP addresses using Iptables.
Avoid being rate limited. Design your client with best practices:

Use client cache to avoid making frequent API calls.
Understand the limit and do not send too many requests in a short time frame.
Include code to catch exceptions or errors so your client can gracefully recover from exceptions.
Add sufficient back off time to retry logic.