After my last post, I decided it was time to finish up several projects, no
matter how experimental, and roll a proper release for them. One such project
I’m finally wrapping up development of, is a Rust library. It’s called
outcome
, and is a different take on error handling compared to that of
Result<T, E>
, (value, error)
in golang, and std::expected<T, E>
in C++.
While the actual outcome
crate itself was taken, I found a “unique” hack
workaround for the issue. The project’s
README explains further.
It’s name is inspired by Niall Douglas’ C++ Boost.Outcome library, which
provides a tri-state monadic interface for handling a value, error code, and
exception. Seeing this library several years ago I’ve kept a close eye on it. I
found it to be quite a nice interface, but it always felt “off” in the
difference between an error code and an exception. On further reflection, I
came to the personal conclusion that Niall is focused on a very narrow scope of
error handling, in the vein of low level operating system or C library error
codes (many of which are recoverable as no exception has been thrown), and C++
exceptions. Regardless of whether I am right, his work on Boost.Outcome
helped inspire this rust crate.
Having experienced libraries like nom and
reading Error Handling in a Correctness-Critical Rust Project, by the
author of sled, some time ago, I realized some there is a hole missing in
not only how Rust libraries perform error handling, but how recoverable errors
are handled as well. As a result (no pun intended), I chose to adopt the
Outcome
name and apply it in a more generic and rust-like fashion. Thus, we
have a tri-state variant with three states:
Success
Mistake
Failure
If anything, Outcome
is simply an augmentation of Result
. It adds this
third Mistake
state to represent a soft or retryable error (i.e.,
something has gone wrong, but we are signalling to some caller that the
issue can be fixed and that the user is free to try the operation again). A
retryable error can be considered any operation that might not have succeeded,
either due to other operations (e.g., a disk read/write not completing in
time), misconfiguration (e.g., forgetting to set a specific flag before calling
a function), or busy resources (locks, databases, etc.). To handle a retryable
error, developers need to have an API that provides nuance, a subject I will
briefly expand on.
Nuance and “Errors”
The existence of the ?
operator in Rust is a huge boost for productivity,
but can have the effect of someone simply calling some_function()?
and
letting the caller have to discern whether a real error occurred or not. This
breaks down a great deal when an error isn’t really an error, but represents
a non-Ok
state.
The Rust standard library has a great example of this. The Mutex
type in
Rust has, unsurprisingly, a try_lock
function, which returns one of a
MutexGuard
or a TryLockError
. TryLockError
has two variants itself,
Poisoned
(i.e., the lock could not be acquired because another thread failed
while holding the lock) and WouldBlock
, which represents the false
state we
see in most try_lock
functions from other languages that return a boolean,
such as C++. Here-in
lies the issue:
In the case of a “simple” approach one could do something like
let x = mutex.try_lock()?
and only move forward, letting the error bubble up as necessary. But this is a
poor indication of what a user might want, and therefore to properly handle
this error, someone must do a match
expression instead:
let x = match mutex.try_lock() {
Ok(guard) => guard,
Err(TryLockError::WouldBlock) => /* handle this soft error somehow? */
Err(TryLockError::Poisoned<_>(x) => return Err(x)
}
In the case of a WouldBlock
, it becomes much more difficult to implement
something like a spin lock with exponential
back-off. This results in a
loss of nuance to callers of any function, and depending on how deep one
might be in a given call stack, could result in unnecessary resources, context
switches, and the like being wasted trying to re-acquire the lock in a loop.
The documentation for outcome has an example of using Outcome<S, M, F>
to implement a spin lock with exponential back-off based off of Timur
Doumler’s C++ implementation. Unfortunately because of how traits work in
Rust we can’t pass the try_lock
implementation further up the chain via
another Outcome
, and instead must retranslate the Outcome
back into a
Result
. This has the inherent side of effect of showing that using
Outcome<S, M, F>
cannot simply be used as a shim in all APIs, and instead
sometimes requires users to refactor an entire section of an API.
I would argue that, in most languages even outside of Rust, a majority of our time is spent logging an error and simply returning to a given caller. In reality, a warning or some retry operation is meritted; the nuance of what really is an error is something that needs to be taken into consideration when designing these critical paths that must handle mistakes and failures.
Is it an exceptional failure if an HDD cannot read N bytes of data in M amount of time, or is it a soft error that’s worth retrying once or twice before giving up?
Is it an error or a mistake if someone tries to compare two floats without epsilon approximation?
Is it an error or a mistake if someone calls a function expecting a slice of bytes with an empty slice? And how does that differ from a slice of bytes with unexpected data?
Is it an error or a mistake if a condition variable experiences a spurious wakeup?
Is it an error or a mistake if a logical GPU device is lost, or a rendering API’s context was reset?
Is it an error or a mistake if a plugin is not available (versus failing to load) upon request by some user?
These questions require nuance and understanding of the problems that are being
solved at a very low level. When anyone is later reading your code and sees
that the typical return an error when not Ok
cannot be applied in a given
space, it can be a great way to signal to someone that care is required,
details must be paid attention to, and that simply returning an error will not
always suffice.
Why Augment Result<T, E>
?
But why augment Result
? Why not just pass Result<Result<S, M>, F>
around
instead? Beyond the performance aspect of nested enums in Rust, in the post
Error Handling in a Correctness-Critical Rust Project, under the section
making unhandled errors unrepresentable, the author of sled states:
this led me to go for what felt like the nuclear solution, but after seeing how many bugs it immediately rooted out by simply refactoring the codebase, I’m convinced that this is the only way to do error handling in systems where we have multiple errors
The solution, as they explain in the next paragraph, is
make the global
Error
enum specifically only hold errors that should cause the overall system to halt - reserved for situations that require human intervention. Keep errors which relate to separate concerns in totally separate error types. By keeping errors that must be handled separately in their own types, we reduce the chance that the try?
operator will accidentally push a local concern into a caller that can’t deal with it.
As the author of this post later shows, the sled::Tree:compare_and_swap
function returns a Result<Result<(), CompareAndSwapError>, sled::Error>
.
They state this looks way less cute but will
improve chances that users will properly handle their compare and swap related errors properly[sic]
let cas_result = sled.compare_and_swap( "dogs", "pickles", "catfood" )?; if let Err(cas_error) = cas_result { // handle expected issue }
The issue with this return type is that there is technically nothing to stop
a user from using what I would call the Rust WTF operator (??
) to ignore
these intermediate errors, especially if a function returns a catch-all error
type like eyre::Report
let cas = sled.compare_and_swap("dogs", "pickles", "catfood")??; // WTF M8
The alternative would be to make the interface of these Error
types
impossible to use with most error handling (and error reporting) libraries.
This result in an interface that is difficult to use and extremely inflexible
for users.
Additionally, it would be hard to forbid this kind of usage with tools like
clippy, as libraries like nom
also rely on this approach and expected nested
Result
s combined with moderately complex pattern matching to extract relevant
information.
Luckily, it is easier to prevent this issue in the first place if:
- An explicit call to extract an inner
Result<T, E>
-like type must be made - The call of an easily greppable/searchable function before using the WTF operator is permitted.
- The
Try
trait returns a type that must be decomposed explicitly and does not support the try?
operator itself.
outcome
provides this support in the form of a Concern
type (a name taken
directly from one of the sled author’s paragraphs quoted above), whose variants
match the Success
and Failure
of Outcome
, as well as an associated method
Outcome::acclimate
, which returns a Result<Concern<S, M>, F>
. This call
is only provided as overriding the Try
operator in Rust is still only
available on nightly.
At the moment, Concern
is its own type, unusable with ?
, and does not
contain a failure state. Its variants of Success
and Failure
are,
unfortunately, unique and not shareable with Outcome
s. However once never
(!
) has been stabilized in Rust, Concern
will be an alias for Outcome<S, M, !>
.
Escalating States
One of the more interesting ideas I’ve decided to try with Outcome
is that of
“state escalation”. Specifically, we can sometimes perform some operation
(e.g., “parse an integer”) that technically has succeeded but may not fit
within our desired constraints (e.g., “The integer is not within the range
0..10
”), and we want to mutate the state of the enum into a Mistake
variant
using the value stored in the Success
state (either directly or passed to
some closure that mutates the value into some Mistake
value). We also may
more often want to escalate from a Mistake
to a Failure
if some procedure
has determined that the current state of retries has exceeded some constraint.
For this reason, Outcome
provides an API for expressing this, while also
simultaneously marking the previous state as invalid (i.e., !
, never
, or
Infallible
). Additionally, when escalating from Success
to Mistake
and
then possibly to Failure
, we want to at least discourage users from
de-escalation of an Outcome
.
This de-escalation prevention is actually much easier to implement in C++.
Simply specialize some outcome<void, M, F>
type that cannot be constructed or
assigned with an outcome<S, M, F>
publicly, and likewise, prevent an
Outcome<(not is_void<S>), M, F>
from being constructed or assigned with an
outcome<void, M, F>
. This is then also repeated for an outcome<void, void, F>
.
Sadly Rust does not have any way to actually express this, and to implement
such a feature would be very difficult (besides the obvious question of “what
language feature is even needed?”). Until then, users simply have to consider
that treating an Outcome
with a Mistake
or Failure
only state is much
like a Result
in an Err
state, while trying to turn it back into an Ok
.
At this time, state escalation is still extremely experimental, and I’m
unsure if it will survive to a 1.0 release, or if it will become more enhanced
and featureful over time.
Feature Support
Lastly, one small bit I’m actually proud of is the ability for Outcome
to
supply several cargo features that enhance its error handling capabilities
and usability in other libraries, including
no_std
support- basic integration with
miette
for use with itsDiagnostic
type - basic integration with
eyre
for use with itsReport
type unstable
API support instd
nightly
compiler feature supporttry_trait_v2
(You can useOutcome
as the return value to a function and use the?
operator. This is usable in conjunction withno_std
) 🙂termination_trait_lib
(You can useOutcome
as the return type tofn main
, and any unit test)
Of course, using the termination
trait and returning a mistake still counts
as an error. However, if a library were to replace rust’s libtest, rust were to
adopt the Outcome
type, or some third party put in the work otherwise, I’m
sure tooling could be massaged to perform different actions based on whether a
test returned a Mistake
or a Failure
.
Finishing Up
With all of that said, I don’t think I will ever consider this crate to be at
1.0 until Rust has stabilized the Try
trait, and !
. I’m quite pleased with
this crate, even if it is experimental and needs to be adopted by other
library writers before it can really hit its stride in use. I hope that if you
do end up using it, you find it solves the problems that I’ve personally
run into when operating in the Rust ecosystem.