ADR-0022: Delegation as tool

Status: Accepted Date: 2026-04-22

Context

With the agent runtime shipped (ADR-0021) the platform can run one agent end-to-end, but still cannot run two — the validation target that distinguishes “a working AI platform” from “a framework with a working LLM call.”

The original design doc lists four agent communication patterns:

Parent → Sub-Agent delegation (the subject of this ADR)
Peer → Peer requests
Broadcast events
Sub-Agent → Parent reporting (implicitly covered by #1’s return value)

Patterns 2–4 involve either bidirectional routing or async event flow — both require stateful orchestration and in Pattern 3’s case likely a queue (Cloudflare Queues + Durable Objects). Pattern 1 is the minimum viable multi-agent primitive and the one that directly unlocks the validation milestone (“two agents actually talking to each other”).

The design question is what shape Pattern 1 should take. The options that came up in deliberation:

(a) A delegate() primitive. A function the runtime exposes; the parent agent’s application code calls it when the model asks to delegate.
(b) A stateful Orchestrator class. Long-lived, tracks ongoing delegations, enforces depth and concurrency limits in state.
(c) An event bus with queued task delivery and subscription.
(d) Delegation as a tool. The sub-agent is wrapped as a Tool; the parent’s existing tool loop invokes it like any other capability.

Decision

(d) — delegation is a tool call. createDelegationTool(options) returns a Tool whose handler runs the sub-agent’s turn and returns the resulting AgentReport as tool output.

Why a tool, not a primitive or an orchestrator

The parent’s tool loop (ADR-0021) already enforces the invariants a delegation primitive would need:

Budget composition. The tool handler receives budget_remaining_usd via ToolExecutionContext; threading this into the sub-agent’s TaskConstraints.cost_budget_usd gives bar-10 preservation end-to-end at zero additional plumbing.
Structured logging. tool_call events already record delegation attempts at the parent level; turn_started / turn_completed from the sub-agent record the delegation internally. Two logs per delegation, grep-able by event name, without new event types.
Autonomy enforcement. Disallowed tools are already filtered and checked; disallowed sub-agents reuse the same surface via the tool being named delegate_to_<sub_agent_name>.
Error recovery. A sub-agent failure can become a soft tool_result, letting the parent try a different approach — which is exactly what “retry / escalate” means in the design doc’s agent communication patterns.

A dedicated delegate() primitive would replicate most of this machinery. A stateful Orchestrator would add real value only when the parent has concurrent sub-agents or async delegation — neither of which we need for the validation target. Both options earn their complexity only when Pattern 2 or Pattern 3 ships; today they are speculative.

An event bus (Pattern 3) is a future ADR, not a replacement for Pattern 1.

Schema change: `Task.delegation_depth`

A new optional field on DelegatedContext (and therefore Task) records the hop count from the root user-initiated task. 0 at the root, incremented by 1 on each delegation. The runtime checks against AutonomyBoundaries.max_delegation_depth before entering the sub-agent’s turn; violation throws AutonomyBoundaryError with violation: 'delegation_depth_exceeded'.

Field is optional for backward compatibility. Tasks constructed before this ADR have delegation_depth: undefined, which the runtime treats as 0 — unchanged behavior. Schema drift check in packages/schemas/src/task.ts updated accordingly; compile-time test still passes.

Error propagation policy (asymmetric on purpose)

When a sub-agent’s turn throws, the delegation tool handler does not uniformly swallow or propagate. Two cases propagate; everything else becomes a structured tool output the parent can reason about:

Error	Propagate or return as tool output?	Why
`AutonomyBoundaryError`	Propagate	Security-adjacent event. Operator must see. Silent recovery would hide violations.
`LLMAuthError`	Propagate	Operator problem (bad API key). No parent-side retry can fix it.
`LLMRateLimitError`, `LLMTimeoutError`, `LLMUnavailableError`, `LLMContextLengthError`, `LLMInvalidRequestError`, `LLMBudgetExceededError`, `TurnBudgetExceededError`, `MaxIterationsError`	Return as tool output with `status: 'failed'`	Parent might retry, simplify the task, or escalate. Information, not an exception.
Non-platform `Error` (bug, uncaught)	Propagate	We never swallow unexpected exceptions.

The policy was extended to the runtime’s general tool-handler catch block (not just the delegation tool). Any tool that throws AutonomyBoundaryError propagates; any tool that throws LLMAuthError propagates. Everything else becomes a soft tool_result. This is a behavioral change from ADR-0021 and is documented here because delegation is the forcing function — but the reasoning applies to every tool.

Budget composition

The parent’s ToolExecutionContext.budget_remaining_usd flows into the sub-agent’s TaskConstraints.cost_budget_usd. When the sub-agent’s runtime then passes the remaining budget to its own adapter calls (per ADR-0021’s turn-level enforcement), the cap composes correctly: the sub-agent cannot spend more than the parent has left.

Time budget is not explicitly composed in this iteration. The parent’s time_budget_ms applies to the parent’s own LLM calls; sub-agent time is observable via duration_ms in the returned report but not automatically constrained. Tracking: when sub-agent runs routinely approach the parent’s time budget, add time budget composition. Documented in What’s Next.

Depth starts at 1, not at whatever was on the parent’s task

The delegation tool handler currently sets delegation_depth: 1 on the sub-agent’s task regardless of the parent’s depth. This is correct only for first-level delegation (the validation target). For deeper chains (parent → sub → sub-sub), the handler would need access to the parent’s current task’s delegation_depth via ToolExecutionContext — a piece of plumbing not yet added. The delegation tool’s depth check against max_delegation_depth still works as a first-order safeguard.

Deferral noted in code and tracked as a follow-up. Reason for deferring: every realistic agent chain in Phase 1 is one level deep; deeper chains are a Phase 2 concern (orchestrator).

Consequences

Two agents can work together. Parent agent has delegate_to_writer in its tools; the tool loop handles the delegation like any other capability. Validation milestone reached.
No new top-level surface. Zero new log events, no new stop reasons, no amendments to the LLM adapter interface. Everything reuses existing machinery.
Error policy is now stated explicitly. Tool-handler errors are no longer a monolithic “always soft-fail” — security-adjacent events propagate. This is a behavioral change to the runtime’s tool loop; documented in this ADR.
Backward-compatible schema. Task.delegation_depth is optional; existing consumers keep working.
Future orchestration patterns don’t break anything. When Pattern 2, 3, or 4 ship, delegation-as-tool remains the Pattern 1 implementation; the orchestrator for deeper patterns lives above it.

Consequences for the repo

New file: packages/runtime/src/delegation.ts (+ tests).
Modified file: packages/core/src/task.ts (added delegation_depth?).
Modified file: packages/schemas/src/task.ts (added delegation_depth to DelegatedContextSchema).
Modified file: packages/runtime/src/agent-runtime.ts (asymmetric error propagation in executeToolCalls).
Tests: 8 new in delegation.test.ts. Workspace total: 362 passed + 2 skipped.

Alternatives considered

Delegation as primitive (runtime.delegate(...)). Cleaner at the API level but duplicates budget, logging, and autonomy machinery that already exists in the tool loop. Rejected: reuse wins over abstraction-for-abstraction’s-sake.
Stateful Orchestrator class. Valuable when the parent has concurrent sub-agents or async delegation. Neither exists today. Rejected as premature; the state is real in Phase 2’s Pattern 3 and would be built there against actual requirements.
Delegation as new stop reason. Amending ADR-0019 to introduce stop_reason: 'delegate' with a corresponding content-block type. Rejected: no semantic gain over tool_use (the model is asking for a capability either way), and amending a two-session-old ADR on speculation about how providers will surface this in the future is the wrong direction.
Queued delegation via Cloudflare Queues. The Phase-2 orchestrator might look like this. Too much infrastructure for Pattern 1 on its own; a synchronous in-process call is exactly what “one agent delegates to another” needs today.
Tool handler swallows ALL errors as soft tool_results. The ADR-0021 default. Rejected here because it would silently hide AutonomyBoundaryError (the class specifically designed to be grep-worthy) and would make bad API keys look like “the sub-agent had some trouble.” The asymmetric policy is the cost-weighted right answer.
Propagate ALL sub-agent errors. Opposite failure mode — every transient rate limit from a sub-agent would abort the parent’s turn. Defeats the whole point of structured agent communication (the parent should be able to retry or escalate).
Uniform “delegate” tool name with sub_agent in the input. One tool instead of N delegate_to_<name> tools. Considered. Rejected because distinct names let the model reason about which capability to invoke (its tools list reads like a menu); and the autonomy enforcement surface is cleaner (each tool name maps to exactly one sub-agent).

What’s Next

Deeper chains. Thread the parent task’s delegation_depth through ToolExecutionContext so nextDepth in the delegation handler is parent.delegation_depth + 1 rather than always 1.
Time budget composition. Analogous to cost budget; deferred until there’s evidence sub-agent runs approach the parent’s time cap.
Pattern 2 (Peer → Peer). Horizontal agent communication. Likely a separate ADR; might reuse delegation-as-tool with a different tool name (request_from_<peer_name>) or might need different semantics.
Pattern 3 (Broadcast events). The real orchestrator. Cloudflare Queues + Durable Objects. Separate ADR.

ADR-0022: Delegation as tool

ADR-0022: Delegation as tool

Context

Decision

Why a tool, not a primitive or an orchestrator

Schema change: Task.delegation_depth

Error propagation policy (asymmetric on purpose)

Budget composition

Depth starts at 1, not at whatever was on the parent’s task

Consequences

Consequences for the repo

Alternatives considered

What’s Next

Schema change: `Task.delegation_depth`