Rate Limit - App-vNext/Polly GitHub Wiki

Rate-Limit (v7.2.3 onwards)

ℹī¸ This documentation describes the previous Polly v7 API. If you are using the new v8 API, please refer to pollydocs.org.

Purpose

Provides a policy which limits the number of executions which can be performed during a rolling window of time.

Premise: 'One user shouldn't monopolise all the resources'

Rate limiting is a commonly used pattern to control the rate at which an operation can be performed within a given window of time.

One use-case for rate limiting is to help ensure fair usage of a shared resource, for example a web server or database. If a rate limit for requests to a web server is assigned to each user of the server, then resources can be fairly shared between the different users. This helps prevents a single user causing a detrimental effect on the server's performance for other users if they were to issue more requests to the server than is considered reasonable for the server's purpose.

In HTTP, requests that are rate limited are commonly given an HTTP 429 Too Many Requests status code. The response for such a request may also include a Retry-After response header, giving the client a indication of how long to wait before making a new request. Further requests issued before the time indicated by the value of a Retry-After header are likely to fail with the same HTTP 429 status code.

An example of a real-world service that uses rate limiting that you may be familiar with is the GitHub API.

Syntax

RateLimitPolicy rateLimit = Policy
    .RateLimit(20, TimeSpan.FromSeconds(1));

RateLimitPolicy<MyResult> rateLimitOfT = Policy
    .RateLimit<MyResult>(20, TimeSpan.FromSeconds(1));

The above examples will create a rate-limit policy which will allow up to 20 executions of any action handled by the policy within a 1 second window.

Syntax examples given are sync; comparable async overloads exist for asynchronous operation: see the README and wiki.

Parameters

  • numberOfExecutions: The positive non-zero number of executions, N, permitted per timespan.
  • perTimeSpan: How often N executions are permitted by the policy.
  • maxBurst (optional): The maximum number of executions that will be permitted in a single burst (for example if none have been executed for a while).
  • retryAfterFactory (optional): A factory to express the recommended retry-after time for the caller when an operation is rate limited.

Throws

  • RateLimitRejectedException when an execution is rate limited. The thrown exception has a RetryAfter property containing a TimeSpan that specifies the recommended retry-after time for to the caller.

Operation

Each time the policy is executed successfully, one token is used of the bucket of capacity available to the rate-limit policy for the current timespan. As the current window of time and executions continue, tokens continue to be deducted from the bucket.

If the number of tokens in the bucket reaches zero before the window elapses, further executions of the policy will be rate limited, and a RateLimitRejectedException exception will be thrown to the caller for each subsequent execution. The rate limiting will continue until the duration of the current window has elapsed.

When the current window elapses, a new window of execution will begin when the next execution occurs, at which point the number of tokens in the bucket will be refilled. Any requests that were being limited during the previous window will now start to successfully execute again until the refilled bucket of tokens is exhausted again.

This cycle of token depletion and replenishment will continue as executions through the policy occur.

If a burst capacity is configured for the policy, an additional number of executions will be allowed per window, allowing for a degree of uneven usage. Otherwise the available number of executions for the window of time will be smeared, that is, averaged, over the entire window of time which is applied to each timespan applied to the policy.

Usage recommendations

Reuse policies across requests

Policies should be reused across requests, the same as with circuit breaker policies. Otherwise a new rate limit policy being created per request is likely to have the practical effect of never being rate limited if it is only used once per request, then being discarded.

Shard rate limits by user

Rate limits should be sharded per user, such as by user ID. Otherwise if a rate limit is shared equally by all callers through a specific policy, one disruptive caller can cause all callers executing through the policy to be rate limited.

Such behaviour is likely to be counter to the desired effect of using the rate limit in the first place, allowing a single badly-behaved client to disrupt all users more easily than if they had just over-used their share of resources without a rate limit policy in place.

If clients infrequently access the code using the policies, you should consider caching the policies for each shard/user and evicting them if they are not used for a period of time, such as after a period approximately equal to (but greater) than the configured rate limit timespan. Otherwise, over time, your application will accumulate rate limit policies for users who have "gone away" and are not using any resources at all.

Shard rate limits by protected resource

Consider additionally sharding your rate limits by the resource you wish to protect with the rate limit policy. For example you could have a one high rate limit for an inexpensive code path, and another lower rate limit for an expensive code path.

Sharding the rate limit policies in this way allows you to scope the unavailability of your service to a user due to rate limiting to specific areas, leading to a better overall user experience.

For example, separating the rate limits for reads and writes in a web application would allow a user clicking a button to perform a write operation too often to be rate limited on that activity, while still allowing the UI for the application to render without that also being rate limited if the user navigates its pages.

Allow for bursts

Consider configuring a burst when creating a rate limit policy. This prevents all executions though the policy from following a strict X tokens per unit of time enforcement of the rate limit.

For example, a policy that allows a rate limit of 10 executions per second will only allow an execution to succeed once every 100 milliseconds. For a user wishing to perform a one-off operation to fetch 2 resources from an application with a single digit millisecond response time, this would require the user to perform the two fetches at least 100 milliseconds apart, otherwise they would be rate limited, even though without a rate limit this would only take, say, 20 milliseconds.

This behaviour is likely undesirable, so an appropriate burst limit being configured would allow a well-behaved client to perform more executions before hitting the rate limit. In this example, with a burst rate of 5, a client could execute 5 times within 100 milliseconds successfully, with only the 7th execution within that second being rate limited (an initial burst of 5 executions, plus 1 execution per 100 milliseconds).

Thread safety and policy reuse

Thread safety

The operation of RateLimitPolicy and AsyncRateLimitPolicy are thread-safe: multiple calls may safely be placed concurrently through a policy instance.

Policy reuse

RateLimitPolicy and AsyncRateLimitPolicy instances are intended to be re-used across multiple call sites. Otherwise rate limits to a shared resource will not be applied in the manner you would perhaps expect.

⚠ī¸ **GitHub.com Fallback** ⚠ī¸