Skip to content

ADR-0020: Anthropic concrete LLM adapter

Status: Accepted Date: 2026-04-22

ADR-0019 committed the provider-agnostic ModelAdapter interface, the seven-class error taxonomy, and the two-package split (@agent-platform/llm for the interface; one sibling package per provider for concrete implementations). What it did not commit was the concrete implementation for any specific provider.

Session 2 of the LLM adapter work ships the first concrete adapter: @agent-platform/llm-anthropic. The questions this ADR answers are all decisions that surface only at the implementation layer — how SDK errors are classified, how budgets are enforced at millisecond granularity, how pricing is tracked, how the SDK’s behavior is controlled. The interface ADR deliberately deferred these.

Build against @anthropic-ai/sdk directly, not Vercel’s AI SDK

Section titled “Build against @anthropic-ai/sdk directly, not Vercel’s AI SDK”

We use @anthropic-ai/sdk@0.90.x as a workspace dependency. Vercel’s @ai-sdk/anthropic was considered and rejected in ADR-0019 for the interface; the decision applies here too. The official SDK:

  • Gives us direct access to every APIError subclass so translation is class-based, not code-switching on strings.
  • Supports a fetch override for unit tests (scripted HTTP).
  • Runs on Cloudflare Workers natively (per its README).
  • Has no transitive Node-only dependencies.

SDK default is 2. Our seven-class error taxonomy is explicit about what’s retriable; retry policy is the caller’s concern, not the adapter’s. SDK-level retries interact poorly with time_budget_ms (each SDK retry eats into the same budget without the caller knowing), and muddy rate-limit accounting because a 429 might be the SDK’s second attempt rather than the first.

Pricing table as code, with fallback = Opus rates

Section titled “Pricing table as code, with fallback = Opus rates”

src/pricing.ts hard-codes the current-generation rates:

TierModelInput $/MTokOutput $/MTok
criticalclaude-opus-4-65.0025.00
mainclaude-sonnet-4-63.0015.00
subclaude-haiku-4-5-202510011.005.00

Values verified April 2026. Pricing changes ~yearly; a change is a code edit + review + redeploy, not a runtime config. This matches the bar-12 discipline: don’t build configurability before you need it.

Unknown models fall back to Opus rates (the most expensive). If someone passes tier: 'main' with a modelMap pointing at a model we don’t know about, the pre-flight budget check estimates with Opus pricing. This is deliberate — a budget check should over-refuse when facing unknowns, never under-refuse. A new Claude generation that lands in someone’s modelMap before our pricing table updates should fail conservatively, not silently pass cheap estimates.

Real English averages ~4 chars/token; we use 3 to over-estimate input size by ~33%. For a pre-flight budget check, the right direction of error is high. Output cost uses the full max_tokens as the upper bound.

We deliberately do not depend on tiktoken or @anthropic-ai/tokenizer. Accurate tokenization would add bundle weight for marginal accuracy gain; consumers needing penny-accurate pre-flight do their own tokenization outside the adapter.

Time-budget enforcement distinguishes our timeout from caller abort

Section titled “Time-budget enforcement distinguishes our timeout from caller abort”

The adapter creates its own AbortController. If time_budget_ms is set, a setTimeout fires ownController.abort() at that deadline. If the caller also passes abort_signal, we forward its abort into ownController. The SDK sees a single signal.

On catch, we distinguish three cases:

  1. ownController.signal.aborted && !request.abort_signal?.abortedour timeout fired. Throw LLMTimeoutError with elapsed_ms and budget_ms in context.
  2. request.abort_signal?.abortedcaller’s abort fired. Re-throw the SDK’s APIUserAbortError unchanged. This is user intent, not a platform failure; callers handling abort in their own try/catch expect that error shape.
  3. Otherwise → translate via translateSdkError.

Every class exported from @anthropic-ai/sdk maps to exactly one LLM error class:

SDKPlatformSeverityNotes
AuthenticationErrorLLMAuthErrorfatalbad/missing API key
PermissionDeniedErrorLLMAuthErrorfatalaccount lacks model access
RateLimitErrorLLMRateLimitErrorerrorretry_after_ms from header when present
BadRequestError + context-length phrasingLLMContextLengthErrorerror
BadRequestError genericLLMInvalidRequestErrorerror
UnprocessableEntityErrorsame split as 400error
NotFoundErrorLLMInvalidRequestErrorerrorusually unknown model
ConflictErrorLLMInvalidRequestErrorerror
InternalServerErrorLLMUnavailableErrorerror
APIConnectionTimeoutErrorLLMUnavailableErrorerrordistinct from our LLMTimeoutError
APIConnectionErrorLLMUnavailableErrorerrornetwork / DNS / TLS
APIUserAbortErrorre-thrown unchanged (caller intent) or LLMTimeoutError (our timeout)branch on which signal fired
Unknown APIError subclassLLMUnavailableErrorerrorsafe default for new SDK versions
Non-Error thrown valueLLMUnavailableErrorerrorextremely rare but possible

Context-length detection on 400/422 is a message-substring match against phrases like “prompt is too long”, “input is too long”, “maximum context length”. If the heuristic misses, the classification falls back to LLMInvalidRequestError — both are 4xx, both non-retriable without changes, so the behavioral impact is small. The distinction matters only for caller retry strategy (shrink-and-retry vs. fix-code-and-retry).

Consumer owns the logger’s component tag

Section titled “Consumer owns the logger’s component tag”

The adapter receives a Logger via constructor and logs with it directly — it does not call logger.child({ component: 'llm.anthropic' }). The @agent-platform/logger package enforces component as an immutable fixed field (an anti-spoofing property, so a payload can’t inject a false component name). Calling child({component: ...}) silently does nothing.

Callers construct their logger with the right component:

const llmLogger = new JsonLogger({ component: 'llm.anthropic' });
const adapter = createAnthropicAdapter({ ..., logger: llmLogger });

Or wire a scope tag into bindings at the call site:

const agentLogger = new JsonLogger({ component: 'marketing-agent' });
const llmLogger = agentLogger.child({ scope: 'llm.anthropic' });

Documented in the factory options with a code example.

Refusal and pause_turn stop reasons collapse to end_turn + warn log

Section titled “Refusal and pause_turn stop reasons collapse to end_turn + warn log”

The SDK’s StopReason union includes 'refusal' and 'pause_turn' in addition to our four. Rather than silently drop them or make up a mapping, the adapter:

  1. Maps both to 'end_turn' (the turn is semantically complete).
  2. Emits a warn-level log entry (llm_refusal or llm_pause_turn) so the signal isn’t lost.

This is a compromise. A cleaner answer is to amend ADR-0019’s StopReason union to include 'refusal', and add a corresponding field to ModelResponse so callers can branch on it. This is a tracked follow-up; it has not been done in this ADR because we haven’t yet seen the first agent’s behavior to know whether the information needs to reach the caller directly.

Integration tests gated by ANTHROPIC_API_KEY

Section titled “Integration tests gated by ANTHROPIC_API_KEY”

adapter.integration.test.ts makes real API calls. Tests auto-skip when the env var is absent, which is CI’s default. Local developers run them manually after SDK upgrades to verify wire-compatible behavior. Each test uses Haiku + tiny prompts to stay under $0.001 per run.

The choice not to have CI run these is deliberate:

  • CI would need a real API key, which is a secret leak surface.
  • CI would burn tokens on every PR — cost scales with contributor volume.
  • The SDK’s behavior is stable between releases; when it drifts, the unit tests (using scripted fetch) catch the breakage first.
  • Bar 5 is now fully mechanical. Every adapter.generate() call emits llm_call with model, tier, latency, tokens, cost, stop reason. A consumer logs zero boilerplate to satisfy bar 5.
  • Bar 10 is now fully mechanical. Both time_budget_ms and cost_budget_usd are enforced inside the adapter. Consumers cannot forget to wire them through.
  • We own every error the caller sees. No raw AnthropicError leaks past the adapter boundary. A consumer’s catch block only ever sees LLMAuthError | LLMRateLimitError | LLMTimeoutError | LLMContextLengthError | LLMUnavailableError | LLMInvalidRequestError | LLMBudgetExceededError (and APIUserAbortError only if the consumer passed their own abort_signal).
  • Pricing changes require a code-reviewed PR. Acceptable for a ~yearly-churn table; the tradeoff is zero runtime config ceremony.
  • Unknown models never bypass budget checks. The fallback-to-Opus pricing policy is the explicit guard. Future ADRs that add new providers or new routing mechanisms should preserve this property.
  • Refusal-as-end-turn is a known lossy mapping. Follow-up item for ADR-0019 amendment. Logged at warn, not silent.
  • New workspace package: packages/llm-anthropic/. Depends on @anthropic-ai/sdk@0.90.0, @agent-platform/llm, @agent-platform/logger, @agent-platform/errors, @agent-platform/core. First runtime dependency on an external SDK in the platform.
  • 65 new tests (13 pricing, 31 translate-errors, 21 adapter). 2 integration tests skipped unless ANTHROPIC_API_KEY is set. Workspace total: 308 passed + 2 skipped.
  • A “first real LLM call” becomes possible. Any future component that takes a ModelAdapter can be exercised end-to-end by supplying createAnthropicAdapter(...).
  • Mock the SDK via vi.mock rather than inject fake fetch. Simpler test setup, but tests the mock rather than the translation code. Our error taxonomy is the single most valuable thing to test, and mocking loses confidence that translation actually works against real SDK behavior. Rejected.
  • Let the SDK handle retries with maxRetries: 2. Default behavior; saves the caller from writing retry logic. Rejected because the SDK’s retry policy doesn’t match our typed-error taxonomy: it retries on specific status codes without knowing whether the caller has context-length vs rate-limit semantics, and it eats into our time_budget_ms invisibly.
  • Use a third-party pricing API / database. Rejected: pricing changes slowly enough that code + PR + redeploy is the right cadence. A runtime fetch adds a failure mode for every adapter call.
  • Include tiktoken for accurate pre-flight token counting. ~700 KB dependency. Rejected: 3-chars-per-token conservative estimate is good enough for “this request will obviously blow the budget” detection, which is the bar-10 requirement. Penny-accuracy isn’t the point.
  • Surface refusal and pause_turn as new StopReason values today. Cleaner than the collapse-and-warn pattern, but amends ADR-0019 before we know whether callers need the distinction. Deferred to a future ADR driven by real consumer needs.
  • Have the adapter construct its own logger internally from config. Breaks bar-7 injection. The caller-owns-logger pattern is the right architecture even if it means the component tag lives at the call site.
  • Run integration tests in CI against a dedicated low-budget API key. Would detect wire-protocol drift earlier. Rejected on cost and secret-leak surface. Can be revisited if SDK updates ever silently break production.