Retry with jitter - App-vNext/Polly GitHub Wiki

Retry with jitter

ℹī¸ This documentation describes the previous Polly v7 API. If you are using the new v8 API, please refer to pollydocs.org.

A well-known retry strategy is exponential backoff, allowing retries to be made initially quickly, but then at progressively longer intervals: for example, after 2, 4, 8, 15, then 30 seconds.

In high-throughput scenarios, it can also be beneficial to add jitter to wait-and-retry strategies, to prevent retries bunching into further spikes of load.

The problem

Consider a call path experiencing hundreds of calls per second where the underlying system suddenly develops a problem. A fixed-progression exponential backoff can still generate peaks of load.

If the underlying system fails at time t, for example, and you have a fixed-progression backoff strategy generating retries after 2, 4, 8, 15 and 30 second intervals, you can expect to generate further specific jumps in load at t + 2, t + 6, t + 14 etc. (In reality, these specific time values will be augmented by the time it takes calls to time out; the principle of further spikes in load occurring a fixed intervals is the concern.)

Jitter is a decorrelation strategy which adds randomness to retry intervals to spread out load and avoid spikes.

Simple jitter

Jitter can be achieved using any of the .WaitAndRetry(...) overloads which allow you to specify a Func<..., TimeSpan> for calculating the duration of wait before retrying. (More complex overloads also exist, and similar overloads for async retry.)

To illustrate the principle, a simple jitter could be achieved by adding a randomly-varying extra delay to the wait before retrying:

Random jitterer = new Random();
Policy
  .Handle<HttpRequestException>() // etc
  .WaitAndRetry(5,
      retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))  // exponential back-off: 2, 4, 8 etc
                    + TimeSpan.FromMilliseconds(jitterer.Next(0, 1000)) // plus some jitter: up to 1 second
  );

Note: the above is an intentionally unsophisticated approach simply to illustrate the concept. See below for production recommendations.

More complex jitter

For production usage, we recommend one of the jitter algorithms offered in Polly.Contrib.WaitAndRetry.

The Polly team originally recommended a widely-referenced jitter strategy described here. However, analysis of this strategy by community members suggested this strategy still exhibits elements of peaks/bunching (see fifth graph from top in that link), especially at the first retry.

We now recommend a new strategy delivered in Polly.Contrib.WaitAndRetry. This provides an overall smoother distribution of jittered retry timings, as shown in the third-from-bottom graph in this link.

For configuration, usage and deeper info, see Polly.Contrib.WaitAndRetry.