Skip to content

Anthropic

The platform’s primary LLM provider. Every agent turn is an Anthropic API call — main agents on Claude Sonnet, sub-agents on Claude Haiku, with the model choice declared per-agent in the YAML.

Two model tiers, one per agent role:

Agent roleModelUsed for
Main agentsClaude SonnetTop-level reasoning. Receives the user’s request, decides what tools to call, when to delegate. Examples: triage, merchandising.
Sub-agentsClaude HaikuFocused, narrower tasks. Faster + cheaper per call. Examples: refund_decision, communication.

The model is part of the agent definition; changing it is editing a YAML file. The runtime treats every model uniformly through the ModelAdapter interface — the rest of the platform doesn’t know which model is in use.

The decision was: which LLM provider?

OptionVerdict
Anthropic (Claude family)Chosen. Best-in-class tool use; clear pricing tiers (Haiku/Sonnet/Opus); strong system-prompt adherence; SDK first-class for streaming + tool calls.
OpenAIStrong contender, especially for older systems. We use OpenAI for embeddings (a different problem). For agent reasoning, Claude’s tool-use behavior was a better fit at platform-design time.
Open-source models (Llama, Mistral) self-hostedCosts more to run at our scale than to call hosted APIs; loses the “no infra” property; quality gap is real for agent reasoning.
Google Gemini, Cohere, othersPlausible alternatives. The cost of evaluating “which LLM” forever is high; we picked one, abstracted via ModelAdapter, and moved on.

The platform isn’t locked to Anthropic — ModelAdapter is the abstraction. Adding an OpenAI adapter for agent reasoning is an afternoon’s work. We chose Anthropic first because the demo matters more than the comparison matrix.

Anthropic pricing (per million tokens):

ModelInputOutput
Claude 3.5 Haiku$1.00$5.00
Claude 3.5 Sonnet$3.00$15.00

Phase 1’s e2e demo run cost roughly:

  • Triage agent: ~3K input tokens, ~500 output tokens (Sonnet) ≈ $0.017
  • Refund decision agent: ~2K input, ~400 output (Haiku) ≈ $0.004
  • Communication agent: ~2K input, ~400 output (Haiku) ≈ $0.004
  • Plus seed embeddings (OpenAI, separate)

Total: ~$0.025 per full demo run in LLM costs. The $0.05 quoted in the runbook is conservative.

A production scenario at scale (thousands of triage runs per day) puts this at $25-50/day. Tier per task aggressively if cost becomes a concern: many decisions in a delegation chain don’t need Sonnet.

A self-hosted LLM stack — GPU clusters, model serving (vLLM / TGI / etc.), monitoring, scaling. Anthropic’s hosted API reduces this to an API key.

  • packages/llm-anthropic/ — the Anthropic adapter; uses @anthropic-ai/sdk and translates SDK errors into the platform’s error taxonomy
  • packages/llm/ — the provider-agnostic ModelAdapter interface that the runtime sees
  • apps/worker/agents/*.yaml — per-agent model declarations (model: claude-sonnet-4-5-20250929, etc.)
  • Vendor concentration. All agent reasoning runs through Anthropic. If they have an outage, agents stop working. The ModelAdapter abstraction means we can fail over to another provider, but we don’t do that automatically today.
  • No streaming to the client. The Worker buffers the full agent turn and returns a complete AgentReport. Streaming would be nice for the eventual UI but isn’t needed for the HTTP API.
  • No fine-tuning. Anthropic doesn’t offer fine-tuning today. Our customization happens via system prompts and the YAML characteristics block. This is the right level of control for agent definition; fine-tuning would add a maintenance burden most agents don’t need.