ADR-0021: Agent runtime and tool loop
ADR-0021: Agent runtime and tool loop
Section titled “ADR-0021: Agent runtime and tool loop”Status: Accepted Date: 2026-04-22
Context
Section titled “Context”With the LLM adapter interface (ADR-0019) and Anthropic concrete (ADR-0020) shipped, the platform can make LLM calls but cannot yet run an agent. An agent is a higher-level construct: a declarative AgentDefinition plus a ContextBundle plus a ModelAdapter plus tool access, executed end-to-end as one turn.
The missing piece is the tool loop. A turn rarely completes in one LLM call — the model requests tools, the runtime executes them, the results feed back into another LLM call, repeat until done. Every consumer that runs agents would otherwise have to write this loop themselves, with inconsistent budget handling, inconsistent error translation, and inconsistent observability. That’s a recipe for the platform’s bar-5 and bar-10 guarantees quietly eroding one consumer at a time.
ADR-0013 bars that the runtime must enforce:
- Bar 5: every agent turn produces a traceable structured record (agent id, task id, duration, LLM call count, tool call count, cost, outcome).
- Bar 8: all errors are typed
AgentPlatformErrorsubclasses. - Bar 10:
TaskConstraints.time_budget_msandcost_budget_usdare enforced across the entire turn (not just per-LLM-call). - Bar 12: abstractions built deliberately, not speculatively.
Three design questions deferred from earlier ADRs are resolved here: how the tool loop works, where tool resolution happens, how turn-level budgets compose with per-LLM-call budgets.
Decision
Section titled “Decision”Package placement: extend @agent-platform/runtime
Section titled “Package placement: extend @agent-platform/runtime”The runtime already owns the context assembler. The agent turn executor is the second concern of the same package — “things that execute at runtime, enforcing the invariants declared in core and validated by schemas.” Both concerns share the same customer (the agent-hosting Worker) and the same vocabulary. Splitting them would create a package named @agent-platform/agent-runtime that depends on @agent-platform/runtime, with ambiguous naming and an artificial boundary.
Adding @agent-platform/llm and @agent-platform/logger as dependencies is correct — the runtime is the orchestration layer.
ToolResolver interface lives in @agent-platform/core
Section titled “ToolResolver interface lives in @agent-platform/core”The runtime depends on an interface, not a concrete registry. Matches the ModelAdapter pattern exactly: interface in a types-only package (core for tools, llm for LLM), concrete implementation in a sibling package (to be shipped later, same way llm-anthropic ships the LLM concrete).
export interface ToolResolver { resolve(name: string): Tool | null;}null for unknown names — the runtime interprets missing tools as a soft failure (returns tool_result with is_error=true) so the agent can recover. Throwing would be the resolver saying “I am broken,” which is a different signal.
createAgentRuntime({adapter, toolResolver, logger, maxIterations?}) factory
Section titled “createAgentRuntime({adapter, toolResolver, logger, maxIterations?}) factory”Matches createAnthropicAdapter in shape. Returns an AgentRuntime interface with a single runTurn(definition, task, context): Promise<AgentReport> method.
No class exported. Consumers see only the interface — tests inject mocks cleanly, the public surface is deliberately narrow, and a future alternative implementation (e.g. one that streams turns) can ship without changing the public type.
The tool loop
Section titled “The tool loop”for iteration in 0..maxIterations: check turn-level budgets (time, cost); throw TurnBudgetExceededError if exceeded build ModelRequest with remaining-budget-for-this-call, not full-budget call adapter.generate(...) if stop_reason == 'end_turn' or 'stop_sequence': return AgentReport if stop_reason == 'max_tokens': return AgentReport with truncation risk if stop_reason == 'tool_use': for each tool_use block: enforce allowed_tools → AutonomyBoundaryError on violation resolve tool → if null, tool_result with is_error=true (soft) execute tool → capture result or error message append tool results as a user message, loopthrow MaxIterationsErrorDefault maxIterations = 10
Section titled “Default maxIterations = 10”Generous enough for realistic multi-step agent work (research → analyze → draft → validate), small enough that a stuck agent terminates quickly. Configurable per-runtime.
Remaining-budget-per-call, not full-budget-per-call
Section titled “Remaining-budget-per-call, not full-budget-per-call”If the task budget is $1.00 and the first LLM call spent $0.30, the second call’s ModelRequest.cost_budget_usd is $0.70, not $1.00. The adapter’s pre-flight check then uses the correct remaining budget. Same for time. This composition is correct when the adapter-level and turn-level checks are viewed as a single gate: the call can’t exceed remaining budget, and the total can’t exceed task budget.
Adapter errors propagate; exception: LLMBudgetExceededError → TurnBudgetExceededError
Section titled “Adapter errors propagate; exception: LLMBudgetExceededError → TurnBudgetExceededError”Every other LLM error (LLMAuthError, LLMRateLimitError, etc.) propagates unchanged — the caller’s switch-on-class code works the same whether they called the adapter directly or through the runtime.
LLMBudgetExceededError is re-classified because the semantic unit is “the turn didn’t complete,” and the specific failure mode of a pre-flight refusal on one of several LLM calls within the turn is implementation detail. The runtime wraps it with the underlying error as causedBy, so callers who want the detail can still reach it.
Autonomy enforcement: two layers
Section titled “Autonomy enforcement: two layers”Tools that are not in task_constraints.allowed_tools are:
- Filtered out of what’s offered to the model. The model never sees them in
ModelRequest.tools. Defense in depth. - Hard-rejected if the model calls one anyway. Throws
AutonomyBoundaryErrorwith violation kind and attempted tool name. The model can still hallucinate a tool name — the runtime must not execute it, and the event needs to be grep-worthy.
Contrast with unknown tools (resolver returns null): those are soft errors returning tool_result with is_error=true. The distinction matters — an unknown tool is typically a hallucination that the agent can recover from; a disallowed tool is a policy violation that should terminate.
Three new error classes
Section titled “Three new error classes”| Class | Severity | When | Caller action |
|---|---|---|---|
TurnBudgetExceededError | warn | Cumulative time/cost crossed task_constraints | Raise budget or accept partial |
MaxIterationsError | error | Tool loop ran past maxIterations cap | Investigate (stuck loop) or raise cap |
AutonomyBoundaryError | error | Agent attempted a disallowed action | Investigate — security-adjacent |
Same “one class per distinct caller action” pattern as the seven LLM error classes. Each has a static CODE, a default severity that can be overridden per-instance, and documented context field expectations.
Structured logging: four event types
Section titled “Structured logging: four event types”turn_started— agent, task, task_type, tiertool_call— agent, task, tool, duration_ms, outcome (ok/error)turn_failed— agent, task, duration_ms, counters, error_class (when the runtime throws)turn_completed— agent, task, duration_ms, counters, cost_usd (emitted on all exit paths viafinally)
The finally-emitted turn_completed is the key observability property: regardless of whether the turn succeeded or threw, there’s one line you can grep for per turn. Combined with the adapter’s per-call llm_call entries and the runtime’s tool_call entries, a single turn produces a complete, grep-able trace.
Consequences
Section titled “Consequences”- Bars 5, 8, and 10 are now fully mechanical across the entire turn layer. Previous ADRs made them mechanical per-LLM-call; this ADR extends enforcement to the turn level where task constraints actually live.
- A consumer can run an agent turn in ~10 lines. Construct adapter, construct runtime, call
runTurn(definition, task, context). Observability, budget enforcement, error typing, tool loop — all handled. - Delegation is visibly deferred. The runtime handles one agent, not a graph. Attempts to delegate (via a hypothetical future
tool_useon a “delegate” tool) would be trapped by the autonomy enforcement, as would any orchestrator-level concern. That’s correct for this session — the orchestrator is its own ADR. - Memory and conversation-history are visibly deferred. The runtime takes a
ContextBundleas input (already including working/long-term memory layers from the assembler) and does not append to it during the turn. Multi-turn conversations are the caller’s responsibility until the memory ADR lands. - The consumer layer is thin by design. The runtime does orchestration, not policy. Retry policy, long-running-task splitting, delegation, human-in-the-loop approval — all are the caller’s concern, implemented above the runtime using the typed errors as control-flow signals.
Consequences for the repo
Section titled “Consequences for the repo”@agent-platform/runtimenow depends on@agent-platform/llmand@agent-platform/logger. The dependency graph correctly reflects orchestration-on-top-of-adapter-and-logger.@agent-platform/coregains a single new interface (ToolResolver). No breaking changes.- 46 new tests (27 error conformance + 19 runtime integration). Workspace total: 354 passing + 2 skipped.
- The first end-to-end agent turn is now possible. Every subsequent Phase 1 component (memory, tool registry, orchestrator) sits on top of a working turn executor rather than a theoretical one.
Alternatives considered
Section titled “Alternatives considered”- New package
@agent-platform/agent-runtime. Rejected: the context assembler and turn executor share the same customer, the same mental model, and the same ADR chain. Two packages would create an artificial boundary with no payoff. - Tool resolver in
@agent-platform/llm. Rejected: tools are a first-class platform concept (Tool,ToolCall, etc. already live incore), not an LLM concept. Putting the resolver with the LLM package would couple them unnecessarily. - Single-round runtime —
runTurnreturns after one LLM call, caller handles the tool loop. Simpler implementation but pushes the loop onto every consumer, which defeats bar-5 (per-turn tracing would be per-consumer-reimplemented) and bar-10 (budget enforcement would be per-consumer-reimplemented). The loop is exactly the shared thing worth centralizing. - Streaming
runTurnreturning an async iterable of events. Cleaner for UIs that want to show reasoning as it happens; adds real complexity to budget enforcement (streaming costs are harder to accumulate mid-call) and requires the adapter interface to support streaming (deferred per ADR-0019). When a UI use case actually demands this, we addrunTurnStreamas a second method — additive to the interface, not a reshape. - Include retry logic in the runtime. Tempting because many failure modes are retriable. Rejected: retry policy depends on context the runtime doesn’t have (is this a user-facing request where latency matters more than success? is this a background job where we want aggressive retry?). The typed error classes give the caller everything needed to decide. Retry lives above the runtime, not inside.
AgentRuntimeas a class rather than a factory + interface. Factory matchescreateAnthropicAdapter, keeps the implementation private, and lets us swap implementations without breaking consumer type references.- Run the runtime inside a Worker Durable Object automatically. Actions like “single-instance-per-agent” or “cross-turn state persistence” feel naturally runtime-shaped. Deferred: Durable Object ownership is an orchestration concern, not a turn-execution concern. The orchestrator ADR (later) decides that.
- Enforce
requires_human_approvalinside the runtime. Tempting becauseAutonomyBoundaryErrorhas that in its surface. Deferred: human-in-the-loop flows require a persistence and callback mechanism that doesn’t exist yet. When the orchestrator ships, we revisit. For now the runtime recognizes the field exists but doesn’t enforce it; a turn that would trigger approval just runs to completion.