ADR-0019: Provider-agnostic LLM adapter interface

Status: Accepted Date: 2026-04-21

Context

ADR-0008 committed the platform to ModelTier as the agent-facing abstraction — agents ask for 'critical' | 'main' | 'sub', never a specific model name. That ADR left the adapter shape itself undefined: what does a call look like, what does it return, how are errors structured, where does the tier-to-model mapping happen?

ADR-0013 bars 5 and 10 set hard requirements for anything that makes LLM calls:

Bar 5: every LLM call produces a traceable structured record (agent id, task id, model, token counts, latency, cost, outcome).
Bar 10: TaskConstraints.time_budget_ms and cost_budget_usd must be enforced, not merely received.

With the foundation layer in place (errors, logger, config) and the LLM adapter being the next component to build, the interface shape can no longer be deferred. A component that doesn’t exist yet cannot cause problems; a component that exists with the wrong shape contaminates every downstream consumer.

This ADR commits the shape. The Anthropic concrete implementation is a separate ADR that will be written alongside the code.

Decision

Ship two packages:

@agent-platform/llm — the interface, types, error classes, and MockAdapter.
@agent-platform/llm-<provider> — one per concrete provider (-anthropic, future -openai, etc.).

`ModelAdapter` interface

interface ModelAdapter {
  readonly provider: string;
  generate(request: ModelRequest): Promise<ModelResponse>;
}

One method. Non-streaming for now (see Alternatives). Throws a typed subclass of AgentPlatformError for every failure mode; never resolves with a partial or untyped error.

Request shape

interface ModelRequest {
  tier: ModelTier;                        // ADR-0008
  system?: string;
  messages: readonly Message[];
  tools?: readonly ModelTool[];
  tool_choice?: ToolChoice;
  max_tokens: number;
  temperature?: number;
  stop_sequences?: readonly string[];
  time_budget_ms?: number;                // enforced, not forwarded
  cost_budget_usd?: number;               // enforced, not forwarded
  abort_signal?: AbortSignal;
}

Message carries either a string (shorthand) or an array of ContentBlocks. Content-block discriminated union is text | tool_use | tool_result. The shape mirrors Anthropic’s Messages API wire format deliberately — it is also the cleanest common ground with OpenAI’s Responses API, so translation for non-Anthropic providers is field-renaming rather than reshaping.

Response shape

interface ModelResponse {
  model: string;                     // post tier resolution — for logs
  content: readonly ContentBlock[];
  stop_reason: 'end_turn' | 'max_tokens' | 'stop_sequence' | 'tool_use';
  usage: { input_tokens, output_tokens, cost_usd };
  latency_ms: number;
}

Always complete. cost_usd is the adapter’s best estimate based on published pricing at call time; pricing changes are handled by redeploying the adapter with an updated internal table.

Error taxonomy — seven concrete classes

Every LLM error extends AgentPlatformError (ADR-0017). Each class maps to a distinct caller action:

Class	`code`	Default severity	Caller action
`LLMAuthError`	`LLM_AUTH_ERROR`	`fatal`	Operator problem; no retry will help
`LLMRateLimitError`	`LLM_RATE_LIMIT`	`error`	Retry with backoff; `context.retry_after_ms` when known
`LLMTimeoutError`	`LLM_TIMEOUT`	`error`	Our budget fired; retry with larger budget or degrade
`LLMContextLengthError`	`LLM_CONTEXT_LENGTH`	`error`	Retry only after shrinking input
`LLMUnavailableError`	`LLM_UNAVAILABLE`	`error`	Retry with backoff; 5xx / network
`LLMInvalidRequestError`	`LLM_INVALID_REQUEST`	`error`	Fix request before retrying; 4xx other than above
`LLMBudgetExceededError`	`LLM_BUDGET_EXCEEDED`	`warn`	Pre-flight refusal; no provider call was made

Platform budgets are enforced, not forwarded

time_budget_ms → adapter creates an AbortController with a scheduled abort() call and passes the signal to the SDK. On timeout the adapter throws LLMTimeoutError with elapsed_ms and budget_ms in context.
cost_budget_usd → adapter computes a conservative pre-flight estimate (using its internal pricing table and the input-token length derived from a rough char-count heuristic, because accurate pre-call tokenization is not free of weight). If the estimate already exceeds budget, the adapter throws LLMBudgetExceededError without making the network call.

Tier-to-model mapping lives inside each concrete adapter

Each createAnthropicAdapter({ apiKey, modelMap }) / createOpenAIAdapter({ apiKey, modelMap }) takes its own modelMap: Record<ModelTier, string>. No central ModelRouter today; when runtime provider A/B ever matters, a thin wrapper-over-N-adapters becomes additive work.

Consequences

ADR-0008 now has a concrete realization. ModelTier flows through ModelRequest.tier; concrete adapters own the mapping.
Bar 5 is mechanizable. Every consumer can log from response.latency_ms, response.model, response.usage without additional instrumentation.
Bar 10 is enforced at the single trust boundary where it matters. Budgets cannot be accidentally bypassed by forgetting to wire them through; the adapter either enforces or throws.
Consumers write provider-agnostic code. A task-running function takes ModelAdapter, not AnthropicAdapter. When a second provider lands, zero consumer-code changes.
Tests are offline by default. MockAdapter covers every consumer’s test surface. The Anthropic package will have its own integration tests gated behind an env flag, never pulling real API credit for unit test runs.
Seven error classes is more surface than a single LLMError. Justified because each maps to a distinct caller action. A consumer that handles LLMRateLimitError with exponential-backoff-and-retry but handles LLMContextLengthError with trim-and-retry needs them as distinct classes, not as switch (err.code) strings. The string-switch alternative is strictly worse ergonomics for the same information density.
Non-streaming only, today. Every consumer we have (and every consumer Phase 1 plans) generates one response per turn. When streaming is actually wanted, it is a separate method on ModelAdapter (generateStream) returning an async iterable — not a reshape of the current generate.

Consequences for the repo

New workspace package: packages/llm/. Depends on @agent-platform/core (for ModelTier) and @agent-platform/errors. No external runtime dependencies.
80 new tests: taxonomy conformance for 7 error classes + MockAdapter behavior. Workspace total 243.
Next session ships packages/llm-anthropic/ — depends on @anthropic-ai/sdk. Separate ADR for the concrete implementation.

Alternatives considered

One LLMError class with a string code field. Three instead of seven classes, simpler taxonomy to add to. Rejected: consumers reach for switch (err.code) and re-derive the same information; class-based dispatch is strictly richer for the same shape. The seven classes are the distinctions a caller cares about — collapsing them moves work from design time to every call site.
Streaming-first interface (AsyncIterable<ModelResponseChunk>). Every provider supports streaming and a streaming interface can always be collected into a non-streaming one. Rejected because the non-streaming interface has a clearer error model (one throw or one resolution), and no current or planned consumer needs streaming. Adding generateStream later as a separate method is cleaner than having today’s consumers collect-to-complete an async iterable for no reason.
A central ModelRouter today. createRouter({ anthropic, openai }) returns something that dispatches per-tier / per-request. Rejected: we have one provider today. Building a router before the second provider exists encodes assumptions we don’t yet have. When the second provider lands, a router is a thin wrapper — not worth the ceremony now.
Use Vercel’s ai SDK (@ai-sdk/anthropic, generateText, etc.). Widely used, covers streaming, multi-provider, tool use out of the box. Rejected because the AI SDK’s abstraction is opinionated in ways that don’t match our needs: its error model is flatter, it couples to React / Next.js idioms in spots, and adopting it means giving up the ADR-0013 bar 5 / bar 10 enforcement guarantees (budgets, per-call audit records) that we need to own at the adapter boundary. When we’ve stabilized enough to know we won’t need to wedge enforcement in at a deeper level, we can reevaluate — but that’s a retrofit worth ~5 days of work, not a 20-minute port.
Single package (@agent-platform/llm with an Anthropic adapter inside). Smaller graph. Rejected because test-setup consumers pull in the Anthropic SDK transitively whether they use it or not. A two-package split keeps unit tests free of external SDK code and makes future provider additions symmetric (each is its own package, not a fork-in-a-shared-package).
Keep the content-block shape fully generic (content: string). Simpler, loses tool-use support. Rejected because every realistic agent turn hits tool use and a fully-generic shape would force every caller to round-trip through a typed layer on their own. The Anthropic-aligned shape loses nothing today and lets tool-using agents work natively.
Model-registry lookup via a separate ModelRegistry service. Decouples tier-mapping from adapter construction. Rejected: adapter-owns-its-map is simpler, each adapter instance is already provider-specific, and the registry pattern earns its ceremony only when we have enough adapters to need centralized configuration. Today we have none.

ADR-0019: Provider-agnostic LLM adapter interface

ADR-0019: Provider-agnostic LLM adapter interface

Context

Decision

ModelAdapter interface

Request shape

Response shape

Error taxonomy — seven concrete classes

Platform budgets are enforced, not forwarded

Tier-to-model mapping lives inside each concrete adapter

Consequences

Consequences for the repo

Alternatives considered

`ModelAdapter` interface