Outcomes, Mistakes, and the Nuance of Error Handling

11 minutes

After my last post, I decided it was time to finish up several projects, no matter how experimental, and roll a proper release for them. One such project I’m finally wrapping up development of, is a Rust library. It’s called outcome, and is a different take on error handling compared to that of Result<T, E>, (value, error) in golang, and std::expected<T, E> in C++. While the actual outcome crate itself was taken, I found a “unique” hack workaround for the issue. The project’s README explains further.

It’s name is inspired by Niall Douglas' C++ Boost.Outcome library, which provides a tri-state monadic interface for handling a value, error code, and exception. Seeing this library several years ago I’ve kept a close eye on it. I found it to be quite a nice interface, but it always felt “off” in the difference between an error code and an exception. On further reflection, I came to the personal conclusion that Niall is focused on a very narrow scope of error handling, in the vein of low level operating system or C library error codes (many of which are recoverable as no exception has been thrown), and C++ exceptions. Regardless of whether I am right, his work on Boost.Outcome helped inspire this rust crate.

Having experienced libraries like nom and reading Error Handling in a Correctness-Critical Rust Project, by the author of sled, some time ago, I realized some there is a hole missing in not only how Rust libraries perform error handling, but how recoverable errors are handled as well. As a result (no pun intended), I chose to adopt the Outcome name and apply it in a more generic and rust-like fashion. Thus, we have a tri-state variant with three states:

  1. Success
  2. Mistake
  3. Failure

If anything, Outcome is simply an augmentation of Result. It adds this third Mistake state to represent a soft or retryable error (i.e., something has gone wrong, but we are signalling to some caller that the issue can be fixed and that the user is free to try the operation again). A retryable error can be considered any operation that might not have succeeded, either due to other operations (e.g., a disk read/write not completing in time), misconfiguration (e.g., forgetting to set a specific flag before calling a function), or busy resources (locks, databases, etc.). To handle a retryable error, developers need to have an API that provides nuance, a subject I will briefly expand on.

Nuance and “Errors”

The existence of the ? operator in Rust is a huge boost for productivity, but can have the effect of someone simply calling some_function()? and letting the caller have to discern whether a real error occurred or not. This breaks down a great deal when an error isn’t really an error, but represents a non-Ok state.

The Rust standard library has a great example of this. The Mutex type in Rust has, unsurprisingly, a try_lock function, which returns one of a MutexGuard or a TryLockError. TryLockError has two variants itself, Poisoned (i.e., the lock could not be acquired because another thread failed while holding the lock) and WouldBlock, which represents the false state we see in most try_lock functions from other languages that return a boolean, such as C++. Here-in lies the issue:

In the case of a “simple” approach one could do something like

let x = mutex.try_lock()?

and only move forward, letting the error bubble up as necessary. But this is a poor indication of what a user might want, and therefore to properly handle this error, someone must do a match expression instead:

let x = match mutex.try_lock() {
  Ok(guard) => guard,
  Err(TryLockError::WouldBlock) => /* handle this soft error somehow? */
  Err(TryLockError::Poisoned<_>(x) => return Err(x)
}

In the case of a WouldBlock, it becomes much more difficult to implement something like a spin lock with exponential back-off. This results in a loss of nuance to callers of any function, and depending on how deep one might be in a given call stack, could result in unnecessary resources, context switches, and the like being wasted trying to re-acquire the lock in a loop.

The documentation for outcome has an example of using Outcome<S, M, F> to implement a spin lock with exponential back-off based off of Timur Doumler’s C++ implementation. Unfortunately because of how traits work in Rust we can’t pass the try_lock implementation further up the chain via another Outcome, and instead must retranslate the Outcome back into a Result. This has the inherent side of effect of showing that using Outcome<S, M, F> cannot simply be used as a shim in all APIs, and instead sometimes requires users to refactor an entire section of an API.

I would argue that, in most languages even outside of Rust, a majority of our time is spent logging an error and simply returning to a given caller. In reality, a warning or some retry operation is meritted; the nuance of what really is an error is something that needs to be taken into consideration when designing these critical paths that must handle mistakes and failures.

  1. Is it an exceptional failure if an HDD cannot read N bytes of data in M amount of time, or is it a soft error that’s worth retrying once or twice before giving up?

  2. Is it an error or a mistake if someone tries to compare two floats without epsilon approximation?

  3. Is it an error or a mistake if someone calls a function expecting a slice of bytes with an empty slice? And how does that differ from a slice of bytes with unexpected data?

  4. Is it an error or a mistake if a condition variable experiences a spurious wakeup?

  5. Is it an error or a mistake if a logical GPU device is lost, or a rendering API’s context was reset?

  6. Is it an error or a mistake if a plugin is not available (versus failing to load) upon request by some user?

These questions require nuance and understanding of the problems that are being solved at a very low level. When anyone is later reading your code and sees that the typical return an error when not Ok cannot be applied in a given space, it can be a great way to signal to someone that care is required, details must be paid attention to, and that simply returning an error will not always suffice.

Why Augment Result<T, E>?

But why augment Result? Why not just pass Result<Result<S, M>, F> around instead? Beyond the performance aspect of nested enums in Rust, in the post Error Handling in a Correctness-Critical Rust Project, under the section making unhandled errors unrepresentable, the author of sled states:

this led me to go for what felt like the nuclear solution, but after seeing how many bugs it immediately rooted out by simply refactoring the codebase, I’m convinced that this is the only way to do error handling in systems where we have multiple errors

The solution, as they explain in the next paragraph, is

make the global Error enum specifically only hold errors that should cause the overall system to halt - reserved for situations that require human intervention. Keep errors which relate to separate concerns in totally separate error types. By keeping errors that must be handled separately in their own types, we reduce the chance that the try ? operator will accidentally push a local concern into a caller that can’t deal with it.

As the author of this post later shows, the sled::Tree:compare_and_swap function returns a Result<Result<(), CompareAndSwapError>, sled::Error>. They state this looks way less cute but will

improve chances that users will properly handle their compare and swap related errors properly[sic]

let cas_result = sled.compare_and_swap(
  "dogs",
  "pickles",
  "catfood"
)?;

if let Err(cas_error) = cas_result {
  // handle expected issue
}

The issue with this return type is that there is technically nothing to stop a user from using what I would call the Rust WTF operator (??) to ignore these intermediate errors, especially if a function returns a catch-all error type like eyre::Report

let cas = sled.compare_and_swap("dogs", "pickles", "catfood")??; // WTF M8

The alternative would be to make the interface of these Error types impossible to use with most error handling (and error reporting) libraries. This result in an interface that is difficult to use and extremely inflexible for users.

Additionally, it would be hard to forbid this kind of usage with tools like clippy, as libraries like nom also rely on this approach and expected nested Results combined with moderately complex pattern matching to extract relevant information.

Luckily, it is easier to prevent this issue in the first place if:

  1. An explicit call to extract an inner Result<T, E>-like type must be made
  2. The call of an easily greppable/searchable function before using the WTF operator is permitted.
  3. The Try trait returns a type that must be decomposed explicitly and does not support the try ? operator itself.

outcome provides this support in the form of a Concern type (a name taken directly from one of the sled author’s paragraphs quoted above), whose variants match the Success and Failure of Outcome, as well as an associated method Outcome::acclimate, which returns a Result<Concern<S, M>, F>. This call is only provided as overriding the Try operator in Rust is still only available on nightly.

At the moment, Concern is its own type, unusable with ?, and does not contain a failure state. Its variants of Success and Failure are, unfortunately, unique and not shareable with Outcomes. However once never (!) has been stabilized in Rust, Concern will be an alias for Outcome<S, M, !>.

Escalating States

One of the more interesting ideas I’ve decided to try with Outcome is that of “state escalation”. Specifically, we can sometimes perform some operation (e.g., “parse an integer”) that technically has succeeded but may not fit within our desired constraints (e.g., “The integer is not within the range 0..10"), and we want to mutate the state of the enum into a Mistake variant using the value stored in the Success state (either directly or passed to some closure that mutates the value into some Mistake value). We also may more often want to escalate from a Mistake to a Failure if some procedure has determined that the current state of retries has exceeded some constraint.

For this reason, Outcome provides an API for expressing this, while also simultaneously marking the previous state as invalid (i.e., !, never, or Infallible). Additionally, when escalating from Success to Mistake and then possibly to Failure, we want to at least discourage users from de-escalation of an Outcome.

This de-escalation prevention is actually much easier to implement in C++. Simply specialize some outcome<void, M, F> type that cannot be constructed or assigned with an outcome<S, M, F> publicly, and likewise, prevent an Outcome<(not is_void<S>), M, F> from being constructed or assigned with an outcome<void, M, F>. This is then also repeated for an outcome<void, void, F>.

Sadly Rust does not have any way to actually express this, and to implement such a feature would be very difficult (besides the obvious question of “what language feature is even needed?"). Until then, users simply have to consider that treating an Outcome with a Mistake or Failure only state is much like a Result in an Err state, while trying to turn it back into an Ok. At this time, state escalation is still extremely experimental, and I’m unsure if it will survive to a 1.0 release, or if it will become more enhanced and featureful over time.

Feature Support

Lastly, one small bit I’m actually proud of is the ability for Outcome to supply several cargo features that enhance its error handling capabilities and usability in other libraries, including

Of course, using the termination trait and returning a mistake still counts as an error. However, if a library were to replace rust’s libtest, rust were to adopt the Outcome type, or some third party put in the work otherwise, I’m sure tooling could be massaged to perform different actions based on whether a test returned a Mistake or a Failure.

Finishing Up

With all of that said, I don’t think I will ever consider this crate to be at 1.0 until Rust has stabilized the Try trait, and !. I’m quite pleased with this crate, even if it is experimental and needs to be adopted by other library writers before it can really hit its stride in use. I hope that if you do end up using it, you find it solves the problems that I’ve personally run into when operating in the Rust ecosystem.

rust

error handling

outcome