ADR 11: Refactoring ibc-rs's Error Handling Architecture
September 26, 2024 ยท View on GitHub
Changelog
- 2024-09-20: Draft Proposed
Status
Proposed
Context
As a library whose main use cases involve integrating with external host code bases, ibc-rs's error handling architecture should strive to achieve the following goals:
- Empower host developers to respond to ibc-rs-originating errors that affect host logic.
- Assist ibc-rs core developers' debugging efforts by providing detailed, human-readable error reports and/or stack traces.
These two aims highlight the need to clearly distinguish between host-level errors and protocol-level errors. Developers working at the host level often do not need to concern themselves with lower-level protocol errors; even if they do, those errors should not be exposed to host developers at the fine-grained level of granularity that is more appropriate for protocol-level developers.
As it currently stands, ibc-rs exposes a top-level ContextError type that is
returned by all validation and execution context methods, which are those that
need to be implemented by hosts as part of the ibc-rs integration process. This
ContextError type encapsulates all of the protocol-level errors that ibc-rs
exposes. As a result, the onus is being placed on host developers to decide
whether these errors are relevant or not. In other words, the distinction
between host- and protocol-level errors is not reflected in ibc-rs's error
handling methodology.
Proposal
Some Changes to ContextError
ibc-rs's top-level error type, ContextError, will be renamed to HandlerError
in an effort to improve the semantics of this error type. HandlerError will be
returned only by the top- level handler entrypoints of ibc-rs, namely the
dispatch, validate, and execute methods.
- pub enum ContextError {
+ pub enum HandlerError {
/// ICS02 Client error: {0}
Client(ClientError),
/// ICS03 Connection error: {0}
Connection(ConnectionError),
/// ICS04 Channel error: {0}
Channel(ChannelError),
- /// ICS04 Packet error: {0}
- Packet(PacketError),
/// ICS26 Router error: {0}
Router(RouterError),
}
In addition, the error variant that contained the PacketError type will be
removed. This is because we're opting to move towards only having a single error
type for each ICS module, removing the discrepancy that the ICS04 module exposes
two distinct error types. The PacketError type will be merged into the
ChannelError type.
The New HostError Type
In light of the stated rationale of making it clear when an error originates
from host logic vs ibc-rs's internal logic, we propose adding a new HostError
type that will live as a variant in each of the module-level error types, as
well as in the application-level error types, TokenTransferError and
NftTransferError. Initially, we had only a single Host(HostError) variant
that existed in the HandlerErrortype, but it became clear that this wasn't the
correct place in which to expose host-level errors. This is because host errors
can crop up within any of the core ibc-rs modules.
We introduce the following concrete HostError type in the ics24-host crate:
pub enum HostError {
/// invalid state: {description}
InvalidState { description: String },
/// missing state: {description}
MissingState { description: String },
/// failed to update store: {description}
FailedToStore { description: String },
/// failed to retrieve from store: {description}
FailedToRetrieve { description: String },
/// other error: {description}
Other { description: String },
}
This initial definition offers fairly generic error variants that nonetheless aim to capture most of the error use-cases faced by hosts. One such notable use-case is fetching/retrieving data from the host's storage.
Host errors can occur within any of the core ibc-rs modules. Thus, we'll be
adding HostError variants to each of the module-level error types where
appropriate: ClientError in ICS02, ConnectionError in ICS03, and
ChannelError in ICS04. Note that as of now a HostError variant is not being
added to the RouterError type in ICS26 as it is not used by hosts, i.e., it
does not expose its own handlers.
The main areas where HostErrors are now being returned are mainly in
ValidationContext and ExecutionContext trait methods. These traits are the
ones that hosts need to implement as part of the process of integrating ibc-rs.
As an example, consider the [consensus_state] method under the
ClientValidationContext trait. This method used to return a ContextError.
One place where it is called is in the upgrade_client handler:
pub fn validate<Ctx>(ctx: &Ctx, msg: MsgUpgradeClient) -> Result<(), ContextError>
...
let old_consensus_state = client_val_ctx
.consensus_state(&old_cons_state_path)
.map_err(|_| ClientError::MissingConsensusState {
client_id,
height: old_client_state.latest_height(),
})?;
The ContextError of the consensus_state method was being mapped onto a
ClientError, which was then mapped back into a ContextError; certainly an
inefficient round-trip. In the case that an error did occur in the
consensus_state method, it would not have been clear to the user that the
error originated from a host context.
This validate function will now be changed to return a ClientError. Coupled
with the consensus_state method now returning a HostError, this call can now
be made much more cleanly:
pub fn validate<Ctx>(ctx: &Ctx, msg: MsgUpgradeClient) -> Result<(), ClientError>
...
let old_consensus_state = client_val_ctx
.consensus_state(&old_cons_state_path)?;
HostErrors cleanly map to ClientErrors (along with any other of the module
errors in ibc-core). In the same vein, ClientErrors map cleanly to
HandlerErrors: with these changes, the granularity of where and how errors are
defined in ibc-rs makes a lot more sense. Plus, the error will now carry the
relevant host context!
Defining a New DecodingError Type
Another significant change that will be made as part of refactoring ibc-rs's
error handling is the introduction of a new DecodingError type. The purpose of
this type is to serve as the single error type returned by every raw TryFrom
conversion in ibc-rs. A major chunk of ibc-rs's logic deals with these sorts of
conversions. As it stands, depending on what type was being converted into,
different module-level errors were used as the error type.
For example, a TryFrom<Any> for AnyClientState impl would return a
ClientError, while a TryFrom<Vec<RawProofSpec>> for ProofSpecs impl would
return a CommitmentError. As a result, there were many conversion-related
error variants scattered across all the different error types, which led to a
lot of duplication and redundancy. In essence, these methods are all dealing
with decoding and/or deserialization in some form or another.
The introduction of the DecodingError type seeks to consolidate these
duplications and redundancies:
pub enum DecodingError {
/// identifier error: {0}
Identifier(IdentifierError),
/// base64 decoding error: {0}
Base64(Base64Error),
/// utf-8 String decoding error: {0}
StringUtf8(FromUtf8Error),
/// utf-8 str decoding error: {0}
StrUtf8(Utf8Error),
/// protobuf decoding error: {0}
Protobuf(ProtoError),
/// prost decoding error: {0}
Prost(ProstError),
/// invalid hash bytes: {description}
InvalidHash { description: String },
/// invalid JSON data: {description}
InvalidJson { description: String },
/// invalid raw data: {description}
InvalidRawData { description: String },
/// missing raw data: {description}
MissingRawData { description: String },
/// mismatched resource name: expected `{expected}`, actual `{actual}`
MismatchedResourceName { expected: String, actual: String },
/// unknown type URL `{0}`
UnknownTypeUrl(String),
}
This type captures most of external decoding- and parsing-related errors that crop up in ibc-rs, as well as variants for encoding internal decoding issues.
Deploying this error type across all of ibc-rs's conversions will have a marked effect on the number of outstanding variants held by the module-level errors, allowing us to reduce a significant number of variants and reduce these module error to something much more akin to its essence.
Positives
The changes introduced in this ADR represent a clear net positive effect on the usability and maintainability of ibc-rs as a whole. The overall hierarchy and semantics of the error system make a lot more sense than its prior incarnation. Module-level errors are much more simplified and streamlined. Additionally, host contexts can be attached to ibc-core errors, making it much clearer where an error originated from.
Negatives
With all that said, these changes of course come with some downsides. A major
one would be the generality of the error types that were introduced: it's not
clear whether we captured the right level of granularity with them. Especially
with the HostError type, it's not clear whether the way this type is laid out
is sufficient for hosts, or whether they would prefer something more bespoke and
tailored to their particular needs.
Most of the new error variants introduced also require String allocations, which is ideal; this is a tradeoff between generality of error variants and specificity. Introducing more specific error variants would help cut down on the number of String allocations, but would contribute to bloating and redundancy within ibc-rs's error types.
Lastly, the new error types and variants do not come with guard rails to help steer ibc-rs's contributors towards following the conventions laid out in their usage. It is very easy to abuse String-allocating variants for errors that they may not actually be appropriate for. The main guard against this comes in the form of PR review, which is not ideal.
Notes on Error Handling Conventions
Error Classifications
This section details the conventions surrounding how error types, their variants, and the error messages contained within variants, are formatted and structured. It serves as a guide for how to parse and read ibc-rs's error messages.
The naming convention of variants follows a few distinct classifications, illustrated by the following example code snippet:
/// The possible classes of errors that error variants can encapsulate
enum ErrorClassifications {
/// nested error: {0}
NestedError(SomeError),
/// something is missing
Missing,
/// already exists
Duplicate,
/// something is invalid
Invalid,
/// mismatched thing: expected `{expected}`, actual `{actual}`
Mismatched { expected: String, actual: String },
/// failed to do something
FailedTo,
/// unknown resource
Unknown,
/// insufficient value
Insufficient,
}
A few exceptions to these classifications exist, but these classifications capture almost all of the error variants that exist within ibc-rs. New error variants defined going forward should fall into one of the above classifications.
Structuring Variants and Error Messages
We'll start with conventions that apply to all variants, regardless of their
classification. Error variants themselves should start with a capital letter,
while error messages should all start with a lower-case letter.
String-interpolated values within error messages should all be surrounded by
backticks, in order to signify that it is an interpolated value. The exception
to this is nested errors that are appended to the end of an error message: these
interpolated values are not surrounded by backticks. Instead, they should
all follow a colon in the error message itself; thus, colons indicate that the
following is a nested error message. Lastly, error messages should all start
with the classification of its respective error variant, i.e., an error message
for an Invalid error class should start with "invalid...", while an error
message for a Missing error class should start with "missing...", etc.
When it comes to the NestedError and Mismatched classifications, the
structure and formatting do not deviate. NestedErrors are always newtype
wrappers around a contained error. The sole purpose of these variants is to
provide a means of converting the contained error to the containing error type.
Thus, every NestedError variant should be accompanied by a From<SomeError> for ContainingError impl. The naming scheme for NestedError variants should
include the name of the contained error, minus the word "Error" itself. The
error message then should clearly delineate the type of the contained error, the
fact that the message is referring to a lower-level error, and the contents of
the lower-level error. An example looks like this:
/// timestamp error: {0}
Timestamp(TimestampError),
Note the colon followed by the interpolation of the nested error: it should not be surrounded by backticks.
The Mismatched classification is used for situations where an expected
instance of a type is known, along with the instance that was actually found.
These are both included in the error under the expected and actual fields.
The error message for this class should always include "expected {expected},
actual {actual}"; this statement should be precluded by "mismatched
[TYPE]:". The type should be spelled in the singular and followed by a colon.
The expected and actual interpolated values in the error message should be
surrounded by {} backticks and then braces.
The rest of the error classes are a bit more freeform in how they are structured. They could take the form of unit structs that do not contain any values, serving mainly to surface a specific error message and nothing more.
/// missing attribute key
MissingAttributeKey,
/// missing attribute value
MissingAttributeValue,
These two variants each highlight the fact that a very particular resource is missing.
Single-element unit structs are used primarily to surface an invalid or duplicate type, opting to not provide any additional context other than what the error message itself provides.
/// invalid status `{0}`
InvalidStatus(Status),
/// duplicate client state `{0}`
DuplicateClientState(ClientId),
The most common form of variant is a one-element struct with a description
field. This serves to allow injecting more context at the call site of the error
in the form of a String. This is the most general variant: care should be taken
to not allow these sorts of variants to be abused for unintended purposes. Note
that, like variants that contain nested errors, descriptions should be
interpolated in the error message without enclosing backticks.
/// failed to verify header: {description}
FailedToVerifyHeader { description: String },
/// invalid hash: {description}
InvalidHash { description: String },
Future Work
In light of the stated downsides, a natural follow-up of this work would be to
generalize ibc-rs's error handling architecture to allow hosts to introduce
their own bespoke error types. This would allow host developers to define more
precise error types and variants that can better signal what the root cause of
an error might be. This would improve upon the rather generic error variants
that are exposed through ibc-rs's own HostError definition.
The downside is that this would add additional work on top of the already considerable amount of work that it takes to integrate ibc-rs into host chains.