ADR-0017: Error taxonomy

Status: Accepted Date: 2026-04-21

Context

ADR-0013 bar item 8 requires: “Errors are typed and structured. Every thrown error is an instance of a named class that extends Error. Every named class carries structured fields relevant to its failure mode. Every error class has a documented log level and a documented user-facing message policy.”

The platform already has one such class — ContextAssemblyError — which extends native Error directly and carries a readonly z.core.$ZodIssue[] field. That’s fine as a one-off, but as a pattern it leads to a zoo of independently-shaped error classes. Each integrates differently with the logger, each decides its own severity, each handles its own serialisation. That’s the state bar 8 is designed to prevent.

Two questions: what shape do error classes share, and what’s the minimum set of concrete classes we ship today?

Decision

All platform errors extend a single AgentPlatformError base class. Ship a minimal initial set of concrete subclasses; grow it deliberately.

Base class (@agent-platform/errors):

Extends native Error so standard try/catch, stack traces, and tool interop work unchanged.
Constructor takes a typed options object: code, severity, message, optional context, optional userMessage, optional causedBy.
code: string — stable UPPER_SNAKE_CASE identifier (e.g. CONTEXT_ASSEMBLY_FAILED). Used for log filtering, metrics, and mapping to user messages.
severity: ErrorSeverity — 'fatal' | 'error' | 'warn' | 'info'. Maps 1:1 to logger levels. Every class declares a default; instances can override.
context: Readonly<Record<string, unknown>> — structured fields relevant to the failure.
userMessage?: string — optional end-user-safe message distinct from .message.
causedBy?: unknown — the lower-level error being wrapped. Also set on native Error.cause for tooling interop.
toJSON() — serialization-safe shape; does not include stack traces (ADR-0013 bar 3).
toJSONWithStack() — same, with stack, for controlled diagnostic contexts only.
Recursive causedBy serialization: platform errors serialise via toJSON; native errors get name+message only; anything else is coerced via String().

Concrete classes shipped in this ADR:

ConfigError — code: 'CONFIG_ERROR', default severity 'fatal'. Platform configuration missing, malformed, or contradictory.
ValidationError — code: 'VALIDATION_ERROR', default severity 'error'. Input failed validation at a trust boundary. Convention: context.issues carries a Zod-style issue array.
ContextAssemblyError — retrofitted in this ADR. Was a direct Error subclass; now extends AgentPlatformError with code: 'CONTEXT_ASSEMBLY_FAILED' and severity 'error'. .issues is preserved as a typed accessor for the Zod issue array; same reference is available via .context.issues.

Conventions for adding future concrete classes:

Add one only when a call site needs to distinguish it from every existing class. Per-component error classes for their own sake create noise with no informational value.
Set this.name in the constructor so logs show the subclass name, not the base.
Declare a static readonly CODE so call sites can reference the code without instantiating.
Document the default severity and when an instance should override (usually in the class’s TSDoc).
Tests cover the code+severity+name triad, that static CODE matches instances, instanceof AgentPlatformError, and the toJSON() shape.

Consequences

Bar 8 is enforceable by code search. grep 'extends Error' in platform packages (outside of @agent-platform/errors itself) should return nothing; any hit is a violation.
The logger’s logError(err) method has a real contract. A platform error comes with its severity declared on the instance; logError uses it directly. Native errors fall back to level error with name+message only. This removes per-call-site decisions about “what level should this be?” — the decision lives on the error class.
context.issues is a convention, not a type. Any error that has a Zod-style issue array uses that path (ValidationError and ContextAssemblyError both do). Future error-rendering UI can rely on this key being present when relevant; a logger view can format it specially.
toJSON() excludes stacks by default. ADR-0013 bar 3 wants stack traces off the default log path because they leak build-server file paths. toJSONWithStack() exists for local debugging and for in-process error trackers with access controls. The default-safe path is the one most consumers use.
The base class handles Error.cause correctly. Setting causedBy in the options also sets native Error.cause via the super constructor — so util.inspect, Vitest’s pretty printers, and any future tooling that reads .cause see the wrapped error in the expected place. Our own .causedBy field gives stricter typing; both paths point at the same value.
Adding a concrete class is cheap. ~20 lines of class code + ~6 small tests. The convention is copy-paste from the existing concrete classes.
Retrofitting existing error classes is cheap. ContextAssemblyError’s retrofit changed fewer than 20 lines of runtime code and added no new tests to the existing 15 — they all pass unchanged because the public contract they depend on (instance class, message shape, .issues accessor) is preserved.

Consequences for the repo

New workspace package: packages/errors/. Zero runtime dependencies.
Exported: AgentPlatformError, ConfigError, ValidationError. Types: AgentPlatformErrorOptions, ErrorSeverity, SerializedError, SerializedNativeError.
Retrofitted: @agent-platform/runtime now depends on @agent-platform/errors; ContextAssemblyError extends AgentPlatformError.
36 new tests in the errors package (20 base + 11 concrete + 5 runtime taxonomy-conformance). Workspace total crosses 139.

Alternatives considered

No hierarchy; use plain Error with a code property. Less ceremony, less introspectability. Rejected because the structured fields (severity, context, userMessage, causedBy) repeatedly need to travel together, and reinventing that on every subsystem produces inconsistency that bar 8 is designed to prevent.
neverthrow / ts-pattern Result types. A fundamentally different programming style — errors as return values instead of thrown exceptions. Rejected for Phase 1 because the surrounding codebase is exception-based, the Vitest runner is exception-based, and forcing Result types retroactively would touch every function signature. May revisit locally in a specific component if the case arises.
One error class per subsystem (e.g. RuntimeError, SchemaError, MemoryError, ToolError). Tempting symmetry. Rejected because it lumps unrelated failure modes together: “RuntimeError” would cover both a validation failure (caller’s fault, error severity) and a missing binding (operator’s fault, fatal severity). The taxonomy here is distinguished by failure kind, not by which subsystem raised it — the kind carries the actionable information.
External error library (verror, haywire, etc.). Same rationale as rejecting pino: the logic we actually need is a few hundred lines, maintaining it is cheap, and avoiding a dependency with its own release cycle and audit surface is worth more than the saved code.
Make AgentPlatformError abstract / uninstantiable. Considered. Would force every call site to pick a subclass, which is a reasonable rule. Rejected because there are genuinely one-off cases in tests and internal utilities where constructing the base directly is the cleaner option, and “never instantiate the base at a real call site” is a convention we can enforce by review + lint without making the class abstract. If that convention turns out to be hard to maintain, we can flip to abstract later as a superseding ADR.
Include stack traces in toJSON(). Rejected as detailed in ADR-0013 bar 3.