ADR-0020: Anthropic concrete LLM adapter
ADR-0020: Anthropic concrete LLM adapter
Section titled “ADR-0020: Anthropic concrete LLM adapter”Status: Accepted Date: 2026-04-22
Context
Section titled “Context”ADR-0019 committed the
provider-agnostic ModelAdapter interface, the seven-class error
taxonomy, and the two-package split (@agent-platform/llm for the
interface; one sibling package per provider for concrete implementations).
What it did not commit was the concrete implementation for any
specific provider.
Session 2 of the LLM adapter work ships the first concrete adapter:
@agent-platform/llm-anthropic. The questions this ADR answers are
all decisions that surface only at the implementation layer — how SDK
errors are classified, how budgets are enforced at millisecond
granularity, how pricing is tracked, how the SDK’s behavior is
controlled. The interface ADR deliberately deferred these.
Decision
Section titled “Decision”Build against @anthropic-ai/sdk directly, not Vercel’s AI SDK
Section titled “Build against @anthropic-ai/sdk directly, not Vercel’s AI SDK”We use @anthropic-ai/sdk@0.90.x
as a workspace dependency. Vercel’s @ai-sdk/anthropic was considered
and rejected in ADR-0019 for the interface; the decision applies here
too. The official SDK:
- Gives us direct access to every
APIErrorsubclass so translation is class-based, not code-switching on strings. - Supports a
fetchoverride for unit tests (scripted HTTP). - Runs on Cloudflare Workers natively (per its README).
- Has no transitive Node-only dependencies.
Disable SDK-level retries (maxRetries: 0)
Section titled “Disable SDK-level retries (maxRetries: 0)”SDK default is 2. Our seven-class error taxonomy is explicit about
what’s retriable; retry policy is the caller’s concern, not the
adapter’s. SDK-level retries interact poorly with time_budget_ms
(each SDK retry eats into the same budget without the caller
knowing), and muddy rate-limit accounting because a 429 might be the
SDK’s second attempt rather than the first.
Pricing table as code, with fallback = Opus rates
Section titled “Pricing table as code, with fallback = Opus rates”src/pricing.ts hard-codes the current-generation rates:
| Tier | Model | Input $/MTok | Output $/MTok |
|---|---|---|---|
critical | claude-opus-4-6 | 5.00 | 25.00 |
main | claude-sonnet-4-6 | 3.00 | 15.00 |
sub | claude-haiku-4-5-20251001 | 1.00 | 5.00 |
Values verified April 2026. Pricing changes ~yearly; a change is a code edit + review + redeploy, not a runtime config. This matches the bar-12 discipline: don’t build configurability before you need it.
Unknown models fall back to Opus rates (the most expensive). If
someone passes tier: 'main' with a modelMap pointing at a model
we don’t know about, the pre-flight budget check estimates with Opus
pricing. This is deliberate — a budget check should over-refuse
when facing unknowns, never under-refuse. A new Claude generation
that lands in someone’s modelMap before our pricing table updates
should fail conservatively, not silently pass cheap estimates.
Token estimation: 3 characters per token
Section titled “Token estimation: 3 characters per token”Real English averages ~4 chars/token; we use 3 to over-estimate
input size by ~33%. For a pre-flight budget check, the right
direction of error is high. Output cost uses the full max_tokens
as the upper bound.
We deliberately do not depend on tiktoken or @anthropic-ai/tokenizer.
Accurate tokenization would add bundle weight for marginal accuracy
gain; consumers needing penny-accurate pre-flight do their own
tokenization outside the adapter.
Time-budget enforcement distinguishes our timeout from caller abort
Section titled “Time-budget enforcement distinguishes our timeout from caller abort”The adapter creates its own AbortController. If time_budget_ms is
set, a setTimeout fires ownController.abort() at that deadline.
If the caller also passes abort_signal, we forward its abort into
ownController. The SDK sees a single signal.
On catch, we distinguish three cases:
ownController.signal.aborted && !request.abort_signal?.aborted→ our timeout fired. ThrowLLMTimeoutErrorwithelapsed_msandbudget_msin context.request.abort_signal?.aborted→ caller’s abort fired. Re-throw the SDK’sAPIUserAbortErrorunchanged. This is user intent, not a platform failure; callers handling abort in their own try/catch expect that error shape.- Otherwise → translate via
translateSdkError.
Error translation table
Section titled “Error translation table”Every class exported from @anthropic-ai/sdk maps to exactly one
LLM error class:
| SDK | Platform | Severity | Notes |
|---|---|---|---|
AuthenticationError | LLMAuthError | fatal | bad/missing API key |
PermissionDeniedError | LLMAuthError | fatal | account lacks model access |
RateLimitError | LLMRateLimitError | error | retry_after_ms from header when present |
BadRequestError + context-length phrasing | LLMContextLengthError | error | |
BadRequestError generic | LLMInvalidRequestError | error | |
UnprocessableEntityError | same split as 400 | error | |
NotFoundError | LLMInvalidRequestError | error | usually unknown model |
ConflictError | LLMInvalidRequestError | error | |
InternalServerError | LLMUnavailableError | error | |
APIConnectionTimeoutError | LLMUnavailableError | error | distinct from our LLMTimeoutError |
APIConnectionError | LLMUnavailableError | error | network / DNS / TLS |
APIUserAbortError | re-thrown unchanged (caller intent) or LLMTimeoutError (our timeout) | — | branch on which signal fired |
Unknown APIError subclass | LLMUnavailableError | error | safe default for new SDK versions |
Non-Error thrown value | LLMUnavailableError | error | extremely rare but possible |
Context-length detection on 400/422 is a message-substring match
against phrases like “prompt is too long”, “input is too long”,
“maximum context length”. If the heuristic misses, the classification
falls back to LLMInvalidRequestError — both are 4xx, both
non-retriable without changes, so the behavioral impact is small.
The distinction matters only for caller retry strategy (shrink-and-retry
vs. fix-code-and-retry).
Consumer owns the logger’s component tag
Section titled “Consumer owns the logger’s component tag”The adapter receives a Logger via constructor and logs with it
directly — it does not call logger.child({ component: 'llm.anthropic' }).
The @agent-platform/logger package enforces component as an
immutable fixed field (an anti-spoofing property, so a payload can’t
inject a false component name). Calling child({component: ...})
silently does nothing.
Callers construct their logger with the right component:
const llmLogger = new JsonLogger({ component: 'llm.anthropic' });const adapter = createAnthropicAdapter({ ..., logger: llmLogger });Or wire a scope tag into bindings at the call site:
const agentLogger = new JsonLogger({ component: 'marketing-agent' });const llmLogger = agentLogger.child({ scope: 'llm.anthropic' });Documented in the factory options with a code example.
Refusal and pause_turn stop reasons collapse to end_turn + warn log
Section titled “Refusal and pause_turn stop reasons collapse to end_turn + warn log”The SDK’s StopReason union includes 'refusal' and 'pause_turn'
in addition to our four. Rather than silently drop them or make up a
mapping, the adapter:
- Maps both to
'end_turn'(the turn is semantically complete). - Emits a
warn-level log entry (llm_refusalorllm_pause_turn) so the signal isn’t lost.
This is a compromise. A cleaner answer is to amend
ADR-0019’s StopReason union to
include 'refusal', and add a corresponding field to ModelResponse
so callers can branch on it. This is a tracked follow-up; it has
not been done in this ADR because we haven’t yet seen the first
agent’s behavior to know whether the information needs to reach the
caller directly.
Integration tests gated by ANTHROPIC_API_KEY
Section titled “Integration tests gated by ANTHROPIC_API_KEY”adapter.integration.test.ts makes real API calls. Tests auto-skip
when the env var is absent, which is CI’s default. Local developers
run them manually after SDK upgrades to verify wire-compatible
behavior. Each test uses Haiku + tiny prompts to stay under $0.001 per
run.
The choice not to have CI run these is deliberate:
- CI would need a real API key, which is a secret leak surface.
- CI would burn tokens on every PR — cost scales with contributor volume.
- The SDK’s behavior is stable between releases; when it drifts, the
unit tests (using scripted
fetch) catch the breakage first.
Consequences
Section titled “Consequences”- Bar 5 is now fully mechanical. Every
adapter.generate()call emitsllm_callwith model, tier, latency, tokens, cost, stop reason. A consumer logs zero boilerplate to satisfy bar 5. - Bar 10 is now fully mechanical. Both
time_budget_msandcost_budget_usdare enforced inside the adapter. Consumers cannot forget to wire them through. - We own every error the caller sees. No raw
AnthropicErrorleaks past the adapter boundary. A consumer’scatchblock only ever seesLLMAuthError | LLMRateLimitError | LLMTimeoutError | LLMContextLengthError | LLMUnavailableError | LLMInvalidRequestError | LLMBudgetExceededError(andAPIUserAbortErroronly if the consumer passed their ownabort_signal). - Pricing changes require a code-reviewed PR. Acceptable for a ~yearly-churn table; the tradeoff is zero runtime config ceremony.
- Unknown models never bypass budget checks. The fallback-to-Opus pricing policy is the explicit guard. Future ADRs that add new providers or new routing mechanisms should preserve this property.
- Refusal-as-end-turn is a known lossy mapping. Follow-up item for ADR-0019 amendment. Logged at warn, not silent.
Consequences for the repo
Section titled “Consequences for the repo”- New workspace package:
packages/llm-anthropic/. Depends on@anthropic-ai/sdk@0.90.0,@agent-platform/llm,@agent-platform/logger,@agent-platform/errors,@agent-platform/core. First runtime dependency on an external SDK in the platform. - 65 new tests (13 pricing, 31 translate-errors, 21 adapter).
2 integration tests skipped unless
ANTHROPIC_API_KEYis set. Workspace total: 308 passed + 2 skipped. - A “first real LLM call” becomes possible. Any future component that
takes a
ModelAdaptercan be exercised end-to-end by supplyingcreateAnthropicAdapter(...).
Alternatives considered
Section titled “Alternatives considered”- Mock the SDK via
vi.mockrather than inject fakefetch. Simpler test setup, but tests the mock rather than the translation code. Our error taxonomy is the single most valuable thing to test, and mocking loses confidence that translation actually works against real SDK behavior. Rejected. - Let the SDK handle retries with
maxRetries: 2. Default behavior; saves the caller from writing retry logic. Rejected because the SDK’s retry policy doesn’t match our typed-error taxonomy: it retries on specific status codes without knowing whether the caller has context-length vs rate-limit semantics, and it eats into ourtime_budget_msinvisibly. - Use a third-party pricing API / database. Rejected: pricing changes slowly enough that code + PR + redeploy is the right cadence. A runtime fetch adds a failure mode for every adapter call.
- Include
tiktokenfor accurate pre-flight token counting. ~700 KB dependency. Rejected: 3-chars-per-token conservative estimate is good enough for “this request will obviously blow the budget” detection, which is the bar-10 requirement. Penny-accuracy isn’t the point. - Surface
refusalandpause_turnas newStopReasonvalues today. Cleaner than the collapse-and-warn pattern, but amends ADR-0019 before we know whether callers need the distinction. Deferred to a future ADR driven by real consumer needs. - Have the adapter construct its own logger internally from config. Breaks bar-7 injection. The caller-owns-logger pattern is the right architecture even if it means the
componenttag lives at the call site. - Run integration tests in CI against a dedicated low-budget API key. Would detect wire-protocol drift earlier. Rejected on cost and secret-leak surface. Can be revisited if SDK updates ever silently break production.