Anthropic
The platform’s primary LLM provider. Every agent turn is an Anthropic API call — main agents on Claude Sonnet, sub-agents on Claude Haiku, with the model choice declared per-agent in the YAML.
What we use it for
Section titled “What we use it for”Two model tiers, one per agent role:
| Agent role | Model | Used for |
|---|---|---|
| Main agents | Claude Sonnet | Top-level reasoning. Receives the user’s request, decides what tools to call, when to delegate. Examples: triage, merchandising. |
| Sub-agents | Claude Haiku | Focused, narrower tasks. Faster + cheaper per call. Examples: refund_decision, communication. |
The model is part of the agent definition; changing it is editing
a YAML file. The runtime treats every model uniformly through the
ModelAdapter interface — the rest of the platform doesn’t know
which model is in use.
Why we picked it
Section titled “Why we picked it”The decision was: which LLM provider?
| Option | Verdict |
|---|---|
| Anthropic (Claude family) | Chosen. Best-in-class tool use; clear pricing tiers (Haiku/Sonnet/Opus); strong system-prompt adherence; SDK first-class for streaming + tool calls. |
| OpenAI | Strong contender, especially for older systems. We use OpenAI for embeddings (a different problem). For agent reasoning, Claude’s tool-use behavior was a better fit at platform-design time. |
| Open-source models (Llama, Mistral) self-hosted | Costs more to run at our scale than to call hosted APIs; loses the “no infra” property; quality gap is real for agent reasoning. |
| Google Gemini, Cohere, others | Plausible alternatives. The cost of evaluating “which LLM” forever is high; we picked one, abstracted via ModelAdapter, and moved on. |
The platform isn’t locked to Anthropic — ModelAdapter is the
abstraction. Adding an OpenAI adapter for agent reasoning is an
afternoon’s work. We chose Anthropic first because the demo
matters more than the comparison matrix.
What it costs
Section titled “What it costs”Anthropic pricing (per million tokens):
| Model | Input | Output |
|---|---|---|
| Claude 3.5 Haiku | $1.00 | $5.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
Phase 1’s e2e demo run cost roughly:
- Triage agent: ~3K input tokens, ~500 output tokens (Sonnet) ≈ $0.017
- Refund decision agent: ~2K input, ~400 output (Haiku) ≈ $0.004
- Communication agent: ~2K input, ~400 output (Haiku) ≈ $0.004
- Plus seed embeddings (OpenAI, separate)
Total: ~$0.025 per full demo run in LLM costs. The $0.05 quoted in the runbook is conservative.
A production scenario at scale (thousands of triage runs per day) puts this at $25-50/day. Tier per task aggressively if cost becomes a concern: many decisions in a delegation chain don’t need Sonnet.
What it replaces
Section titled “What it replaces”A self-hosted LLM stack — GPU clusters, model serving (vLLM / TGI / etc.), monitoring, scaling. Anthropic’s hosted API reduces this to an API key.
Where to look
Section titled “Where to look”packages/llm-anthropic/— the Anthropic adapter; uses@anthropic-ai/sdkand translates SDK errors into the platform’s error taxonomypackages/llm/— the provider-agnosticModelAdapterinterface that the runtime seesapps/worker/agents/*.yaml— per-agent model declarations (model: claude-sonnet-4-5-20250929, etc.)
Trade-offs we accepted
Section titled “Trade-offs we accepted”- Vendor concentration. All agent reasoning runs through
Anthropic. If they have an outage, agents stop working. The
ModelAdapterabstraction means we can fail over to another provider, but we don’t do that automatically today. - No streaming to the client. The Worker buffers the full
agent turn and returns a complete
AgentReport. Streaming would be nice for the eventual UI but isn’t needed for the HTTP API. - No fine-tuning. Anthropic doesn’t offer fine-tuning today. Our customization happens via system prompts and the YAML characteristics block. This is the right level of control for agent definition; fine-tuning would add a maintenance burden most agents don’t need.