ADR-0021 — Agent runtime and tool loop

The execution engine. How an LLM call becomes an agent turn.

What this decision settles

By the time this ADR was written, the platform had an LLM adapter interface (ADR-0019) and an Anthropic concrete (ADR-0020). It could make LLM calls but couldn’t run an agent.

The missing piece is the tool loop. An agent turn is rarely one LLM call — the model requests tools, the runtime executes them, the results feed back into another LLM call, repeat until done.

Every consumer that runs agents would otherwise write this loop themselves, with inconsistent budget handling, inconsistent error translation, inconsistent observability. That’s a recipe for the platform’s quality bars eroding one consumer at a time.

This ADR settles:

Where the tool loop lives. @agent-platform/runtime.
How tool resolution works. A ToolResolver interface in core; concrete registry in a sibling package.
The factory shape. createAgentRuntime({adapter, toolResolver, logger, maxIterations?}).
The loop’s contract. Iteration cap; turn-level budget enforcement; structured AgentReport output; deterministic stop conditions.

Why this matters

The tool loop is where the safety properties become real. The six-layer context model (ADR-0006) prescribes how context is assembled. The agent runtime is what runs the LLM with that context, executes the tools the LLM asks for, enforces budgets on every iteration, and produces a structured turn record.

Without a centralized runtime:

Budgets get checked once per LLM call, not once per turn — meaning a turn with five LLM calls can blow through 5x the intended budget.
Tool errors get translated differently per consumer — some raise, some return, some swallow. Debugging gets harder.
Iteration limits are ad-hoc. A buggy tool that always asks for another tool can run forever.

The runtime is the single point where these invariants get enforced.

The decision

The shape is intentionally small:

interface AgentRuntime {
  runTurn(
    definition: AgentDefinition,
    task: Task,
    context: ContextBundle,
  ): Promise<AgentReport>;
}

function createAgentRuntime(opts: {
  adapter: ModelAdapter;
  toolResolver: ToolResolver;
  logger: Logger;
  maxIterations?: number;
}): AgentRuntime;

Consumers see one method: runTurn. Everything else — the loop, the budget bookkeeping, the error translation — is internal.

The loop, in pseudocode

for iteration in 0..maxIterations:
  check turn-level budgets (time, cost)
    → throw TurnBudgetExceededError if exceeded

  build ModelRequest with remaining-budget-for-this-call
  call adapter.generate(...)

  if stop_reason == 'end_turn' or 'stop_sequence':
    return AgentReport (success)
  if stop_reason == 'max_tokens':
    return AgentReport with truncation risk
  if stop_reason == 'tool_use':
    for each tool_use block:
      enforce allowed_tools → AutonomyBoundaryError on violation
      resolve tool → if null, tool_result with is_error=true (soft)
      execute tool → capture result or error message
    feed tool_results back into the next iteration's ModelRequest

if iteration cap hit without end_turn:
  throw IterationCapExceededError

A few design choices baked into this:

Budgets are checked at iteration boundaries, not just per LLM call. This is what makes task.constraints.time_budget_ms actually enforceable.
Unknown tool name → soft error, not a throw. The runtime returns a tool_result with is_error=true so the LLM can recover (“you tried to call does_not_exist; try a different tool”). Throwing would be the resolver saying “I am broken,” which is a different signal.
AutonomyBoundaryError is a hard throw. If the agent tries to call a tool not in its allow-list, that’s a definition violation, not a recoverable LLM mistake. The agent’s allow-list is enforced at the runtime layer, not trusted to the LLM.

What gets returned

AgentReport is the structured turn record. It includes:

The agent’s final response (text)
The tool calls it made and their results
The total LLM cost (in tokens and dollars)
The total wall time
The number of iterations
Any errors caught and softened

Every turn produces one of these; observability hangs off it.

What this decision does NOT do

Does not handle delegation. Sub-agent calls are tools (see ADR-0022). The runtime doesn’t know about them; the toolResolver does. This is the unification that keeps the runtime small.
Does not handle streaming. A future runtime variant could stream; today’s runtime returns the full report at the end. The interface is shaped so streaming can be added without breaking changes.
Does not handle multi-agent orchestration above the agent level. That’s the application layer. The runtime runs one agent turn; what to do with the report is the caller’s problem.

Trade-offs

Iteration caps are a blunt instrument. A pathological agent can hit the cap; it gets an IterationCapExceededError. We picked a reasonable default (10) and made it configurable per call. Real abuse cases probably need cost/time budgets rather than iteration counts.
The runtime is opinionated about error semantics. Some consumers might prefer different soft/hard categorizations. We picked one set and committed to it; alternative runtime implementations can ship with different choices.

Where to next

For the original ADR with full Context / Decision / Consequences / Alternatives sections, see ADR-0021 source.

Related decisions:

ADR-0019 — LLM adapter interface (what the runtime calls into)
ADR-0022 — delegation as tool (the unification trick)
ADR-0023 — tool registry (the concrete ToolResolver)