ADR-0021: Agent runtime and tool loop

Status: Accepted Date: 2026-04-22

Context

With the LLM adapter interface (ADR-0019) and Anthropic concrete (ADR-0020) shipped, the platform can make LLM calls but cannot yet run an agent. An agent is a higher-level construct: a declarative AgentDefinition plus a ContextBundle plus a ModelAdapter plus tool access, executed end-to-end as one turn.

The missing piece is the tool loop. A turn rarely completes in one LLM call — the model requests tools, the runtime executes them, the results feed back into another LLM call, repeat until done. Every consumer that runs agents would otherwise have to write this loop themselves, with inconsistent budget handling, inconsistent error translation, and inconsistent observability. That’s a recipe for the platform’s bar-5 and bar-10 guarantees quietly eroding one consumer at a time.

ADR-0013 bars that the runtime must enforce:

Bar 5: every agent turn produces a traceable structured record (agent id, task id, duration, LLM call count, tool call count, cost, outcome).
Bar 8: all errors are typed AgentPlatformError subclasses.
Bar 10: TaskConstraints.time_budget_ms and cost_budget_usd are enforced across the entire turn (not just per-LLM-call).
Bar 12: abstractions built deliberately, not speculatively.

Three design questions deferred from earlier ADRs are resolved here: how the tool loop works, where tool resolution happens, how turn-level budgets compose with per-LLM-call budgets.

Decision

Package placement: extend `@agent-platform/runtime`

The runtime already owns the context assembler. The agent turn executor is the second concern of the same package — “things that execute at runtime, enforcing the invariants declared in core and validated by schemas.” Both concerns share the same customer (the agent-hosting Worker) and the same vocabulary. Splitting them would create a package named @agent-platform/agent-runtime that depends on @agent-platform/runtime, with ambiguous naming and an artificial boundary.

Adding @agent-platform/llm and @agent-platform/logger as dependencies is correct — the runtime is the orchestration layer.

`ToolResolver` interface lives in `@agent-platform/core`

The runtime depends on an interface, not a concrete registry. Matches the ModelAdapter pattern exactly: interface in a types-only package (core for tools, llm for LLM), concrete implementation in a sibling package (to be shipped later, same way llm-anthropic ships the LLM concrete).

export interface ToolResolver {
  resolve(name: string): Tool | null;
}

null for unknown names — the runtime interprets missing tools as a soft failure (returns tool_result with is_error=true) so the agent can recover. Throwing would be the resolver saying “I am broken,” which is a different signal.

`createAgentRuntime({adapter, toolResolver, logger, maxIterations?})` factory

Matches createAnthropicAdapter in shape. Returns an AgentRuntime interface with a single runTurn(definition, task, context): Promise<AgentReport> method.

No class exported. Consumers see only the interface — tests inject mocks cleanly, the public surface is deliberately narrow, and a future alternative implementation (e.g. one that streams turns) can ship without changing the public type.

The tool loop

for iteration in 0..maxIterations:
  check turn-level budgets (time, cost); throw TurnBudgetExceededError if exceeded
  build ModelRequest with remaining-budget-for-this-call, not full-budget
  call adapter.generate(...)
  if stop_reason == 'end_turn' or 'stop_sequence': return AgentReport
  if stop_reason == 'max_tokens': return AgentReport with truncation risk
  if stop_reason == 'tool_use':
    for each tool_use block:
      enforce allowed_tools → AutonomyBoundaryError on violation
      resolve tool → if null, tool_result with is_error=true (soft)
      execute tool → capture result or error message
    append tool results as a user message, loop
throw MaxIterationsError

Default `maxIterations = 10`

Generous enough for realistic multi-step agent work (research → analyze → draft → validate), small enough that a stuck agent terminates quickly. Configurable per-runtime.

Remaining-budget-per-call, not full-budget-per-call

If the task budget is $1.00 and the first LLM call spent $0.30, the second call’s ModelRequest.cost_budget_usd is $0.70, not $1.00. The adapter’s pre-flight check then uses the correct remaining budget. Same for time. This composition is correct when the adapter-level and turn-level checks are viewed as a single gate: the call can’t exceed remaining budget, and the total can’t exceed task budget.

Adapter errors propagate; exception: `LLMBudgetExceededError` → `TurnBudgetExceededError`

Every other LLM error (LLMAuthError, LLMRateLimitError, etc.) propagates unchanged — the caller’s switch-on-class code works the same whether they called the adapter directly or through the runtime.

LLMBudgetExceededError is re-classified because the semantic unit is “the turn didn’t complete,” and the specific failure mode of a pre-flight refusal on one of several LLM calls within the turn is implementation detail. The runtime wraps it with the underlying error as causedBy, so callers who want the detail can still reach it.

Autonomy enforcement: two layers

Tools that are not in task_constraints.allowed_tools are:

Filtered out of what’s offered to the model. The model never sees them in ModelRequest.tools. Defense in depth.
Hard-rejected if the model calls one anyway. Throws AutonomyBoundaryError with violation kind and attempted tool name. The model can still hallucinate a tool name — the runtime must not execute it, and the event needs to be grep-worthy.

Contrast with unknown tools (resolver returns null): those are soft errors returning tool_result with is_error=true. The distinction matters — an unknown tool is typically a hallucination that the agent can recover from; a disallowed tool is a policy violation that should terminate.

Three new error classes

Class	Severity	When	Caller action
`TurnBudgetExceededError`	warn	Cumulative time/cost crossed `task_constraints`	Raise budget or accept partial
`MaxIterationsError`	error	Tool loop ran past `maxIterations` cap	Investigate (stuck loop) or raise cap
`AutonomyBoundaryError`	error	Agent attempted a disallowed action	Investigate — security-adjacent

Same “one class per distinct caller action” pattern as the seven LLM error classes. Each has a static CODE, a default severity that can be overridden per-instance, and documented context field expectations.

Structured logging: four event types

turn_started — agent, task, task_type, tier
tool_call — agent, task, tool, duration_ms, outcome (ok/error)
turn_failed — agent, task, duration_ms, counters, error_class (when the runtime throws)
turn_completed — agent, task, duration_ms, counters, cost_usd (emitted on all exit paths via finally)

The finally-emitted turn_completed is the key observability property: regardless of whether the turn succeeded or threw, there’s one line you can grep for per turn. Combined with the adapter’s per-call llm_call entries and the runtime’s tool_call entries, a single turn produces a complete, grep-able trace.

Consequences

Bars 5, 8, and 10 are now fully mechanical across the entire turn layer. Previous ADRs made them mechanical per-LLM-call; this ADR extends enforcement to the turn level where task constraints actually live.
A consumer can run an agent turn in ~10 lines. Construct adapter, construct runtime, call runTurn(definition, task, context). Observability, budget enforcement, error typing, tool loop — all handled.
Delegation is visibly deferred. The runtime handles one agent, not a graph. Attempts to delegate (via a hypothetical future tool_use on a “delegate” tool) would be trapped by the autonomy enforcement, as would any orchestrator-level concern. That’s correct for this session — the orchestrator is its own ADR.
Memory and conversation-history are visibly deferred. The runtime takes a ContextBundle as input (already including working/long-term memory layers from the assembler) and does not append to it during the turn. Multi-turn conversations are the caller’s responsibility until the memory ADR lands.
The consumer layer is thin by design. The runtime does orchestration, not policy. Retry policy, long-running-task splitting, delegation, human-in-the-loop approval — all are the caller’s concern, implemented above the runtime using the typed errors as control-flow signals.

Consequences for the repo

@agent-platform/runtime now depends on @agent-platform/llm and @agent-platform/logger. The dependency graph correctly reflects orchestration-on-top-of-adapter-and-logger.
@agent-platform/core gains a single new interface (ToolResolver). No breaking changes.
46 new tests (27 error conformance + 19 runtime integration). Workspace total: 354 passing + 2 skipped.
The first end-to-end agent turn is now possible. Every subsequent Phase 1 component (memory, tool registry, orchestrator) sits on top of a working turn executor rather than a theoretical one.

Alternatives considered

New package @agent-platform/agent-runtime. Rejected: the context assembler and turn executor share the same customer, the same mental model, and the same ADR chain. Two packages would create an artificial boundary with no payoff.
Tool resolver in @agent-platform/llm. Rejected: tools are a first-class platform concept (Tool, ToolCall, etc. already live in core), not an LLM concept. Putting the resolver with the LLM package would couple them unnecessarily.
Single-round runtime — runTurn returns after one LLM call, caller handles the tool loop. Simpler implementation but pushes the loop onto every consumer, which defeats bar-5 (per-turn tracing would be per-consumer-reimplemented) and bar-10 (budget enforcement would be per-consumer-reimplemented). The loop is exactly the shared thing worth centralizing.
Streaming runTurn returning an async iterable of events. Cleaner for UIs that want to show reasoning as it happens; adds real complexity to budget enforcement (streaming costs are harder to accumulate mid-call) and requires the adapter interface to support streaming (deferred per ADR-0019). When a UI use case actually demands this, we add runTurnStream as a second method — additive to the interface, not a reshape.
Include retry logic in the runtime. Tempting because many failure modes are retriable. Rejected: retry policy depends on context the runtime doesn’t have (is this a user-facing request where latency matters more than success? is this a background job where we want aggressive retry?). The typed error classes give the caller everything needed to decide. Retry lives above the runtime, not inside.
AgentRuntime as a class rather than a factory + interface. Factory matches createAnthropicAdapter, keeps the implementation private, and lets us swap implementations without breaking consumer type references.
Run the runtime inside a Worker Durable Object automatically. Actions like “single-instance-per-agent” or “cross-turn state persistence” feel naturally runtime-shaped. Deferred: Durable Object ownership is an orchestration concern, not a turn-execution concern. The orchestrator ADR (later) decides that.
Enforce requires_human_approval inside the runtime. Tempting because AutonomyBoundaryError has that in its surface. Deferred: human-in-the-loop flows require a persistence and callback mechanism that doesn’t exist yet. When the orchestrator ships, we revisit. For now the runtime recognizes the field exists but doesn’t enforce it; a turn that would trigger approval just runs to completion.

ADR-0021: Agent runtime and tool loop

ADR-0021: Agent runtime and tool loop

Context

Decision

Package placement: extend @agent-platform/runtime

ToolResolver interface lives in @agent-platform/core

createAgentRuntime({adapter, toolResolver, logger, maxIterations?}) factory

The tool loop

Default maxIterations = 10

Remaining-budget-per-call, not full-budget-per-call

Adapter errors propagate; exception: LLMBudgetExceededError → TurnBudgetExceededError

Autonomy enforcement: two layers

Three new error classes

Structured logging: four event types

Consequences

Consequences for the repo

Alternatives considered

Package placement: extend `@agent-platform/runtime`

`ToolResolver` interface lives in `@agent-platform/core`

`createAgentRuntime({adapter, toolResolver, logger, maxIterations?})` factory

Default `maxIterations = 10`

Adapter errors propagate; exception: `LLMBudgetExceededError` → `TurnBudgetExceededError`