Fail fast on certain exception types - rebus-org/Rebus GitHub Wiki

When a message handler throws an exception, Rebus will retry message delivery a couple of times – by default 5 times – before moving the message to the error queue (see Automatic retries and error handling for more details).

This is a pretty good default approach, because it doesn't hurt to try consuming a message again in most circumstances. In some cases it just seems pretty silly, though, especially because we – as humans – can often easily see that it doesn't make sense to try again.

A downside of always retrying 5 times, is that failed messages take up a huge amount of space in our log files, because each failed delivery attempt is logged like this:

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 4 worker 1): Unhandled exception 1 while handling message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82"
MyDomain.DomainException: ( ... full exception details here ...)

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 4 worker 1): Unhandled exception 2 while handling message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82"
MyDomain.DomainException: ( ... full exception details here ...)

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 4 worker 1): Unhandled exception 3 while handling message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82"
MyDomain.DomainException: ( ... full exception details here ...)

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 4 worker 1): Unhandled exception 4 while handling message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82"
MyDomain.DomainException: ( ... full exception details here ...)

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 4 worker 1): Unhandled exception 5 while handling message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82"
MyDomain.DomainException: ( ... full exception details here ...)

[ERR] Rebus.Retry.PoisonQueues.PoisonQueueErrorHandler (Rebus 4 worker 1): Moving message with ID "488e48bd-20cc-482f-a213-ff8aee3edb82" to error queue "error"
System.AggregateException: 5 unhandled exceptions ---> ( ... full base exception details here ... )

---> (Inner Exception #0) MyDomain.DomainException: ( ... full exception details here ... )

---> (Inner Exception #1) MyDomain.DomainException: ( ... full exception details here ... )

---> (Inner Exception #2) MyDomain.DomainException: ( ... full exception details here ... )

---> (Inner Exception #3) MyDomain.DomainException: ( ... full exception details here ... )
   
---> (Inner Exception #4) MyDomain.DomainException: ( ... full exception details here ... )

I.e. every message moved to the error queue means 5 full stack traces at WARNING log level, and one AggregateException containing all 5 exceptions at the ERROR log level. As you can probably imagine, this takes up a lot of space in the logs.

An example could be the case shown above, where our code throws a special DomainException, which happens to be one we throw when some business logic receives bad input. In this case, we pretend that the incoming message is the entire input to this particular business operation, so it doesn't really make sense to try consuming the message again.

How to fail fast

To help reduce the amount of noise in the logs, Rebus has the concept of "fail fast exceptions". An exception is considered a "fail fast exception" when Rebus' IFailFastChecker tells Rebus that it is so.

By default, Rebus will check exceptions for the presence of IFailFastException on the thrown exception, so the easiest (but most intrusive) way to make Rebus fail fast on your exceptions, is to add that interface to your exception.

The more flexible and less intrusive way is to decorate Rebus' IFailFastChecker, and thus add some additional behavior.

The two approaches are demonstrated here.

Implement marker interface

The easiest way to fail fast on a particular exception type, is to add the marker interface IFailFastException to the exception like this:

[Serializable]
public class DomainException : Exception, IFailFastException
{
    // ... exception stuff in here
}

This will make Rebus fail like this instead:

[WRN] Rebus.Retry.ErrorTracking.InMemErrorTracker (Rebus 2 worker 1): Unhandled exception 1 (FINAL) while handling message with ID "819d7779-43e3-4ec7-a2ca-1e6aed82fbf5"
MyDomain.DomainException: ( ... full exception details here ... )

[ERR] Rebus.Retry.PoisonQueues.PoisonQueueErrorHandler (Rebus 2 worker 1): Moving message with ID "819d7779-43e3-4ec7-a2ca-1e6aed82fbf5" to error queue "error"
System.AggregateException: 1 unhandled exceptions ---> MyDomain.DomainException: ( ... full exception details here ... )

---> (Inner Exception #0) MyDomain.DomainException: ( ... full exception details here ... )

which is much prettier. 🙂 Note the text (FINAL) in the log above – it marks the point where Rebus decides that the exception should short-circuit the error tracking and mark the message as having failed too many times.

Decorate IFailFastChecker

A less intrusive way of getting Rebus to fail fast on certain exception types, is to extend the default behavior by installing a decorator. This approach is preferable when you don't want to add a reference to Rebus from the project that defines your exception type, or if it's an exception type that's defined somewhere else.

You install the decorator in the usual Rebus way:

Configure.With(...)
    .(...)
    .Options(o => o.UseMyFailFastChecker())
    .Start();

and then you declare a new configuration extension somewhere else:

static class MyFailFastCheckerConfigurationExtensions
{
    public static void UseMyFailFastChecker(this OptionsConfigurer configurer)
    {
        configurer.Decorate<IFailFastChecker>(c => {
            var failFastChecker = c.Get<IFailFastChecker>();
            return new MyFailFastChecker(failFastChecker)
        });
    }
}

The decorator could then look like this:

class MyFailFastChecker : IFailFastChecker
{
    readonly IFailFastChecker _failFastChecker;

    public MyFailFastChecker(IFailFastChecker failFastChecker)
    {
        _failFastChecker = failFastChecker;
    }

    public bool ShouldFailFast(string messageId, Exception exception)
    {
        switch (exception)
        {
            // fail fast on our domain exception
            case DomainException _: return true;

            // fail fast if table doesn't exist, or we don't have permission
            case SqlException sqlException when sqlException.Number == 3701: return true;

            // delegate all other behavior to default
            default: return _failFastChecker.ShouldFailFast(messageId, exception);
        }
    }
}

As you can see, we also extended our fail fast checker to avoid retrying in cases where our SQL Server thrown a "table does not exist, or you do not have permission" error at us.