Skip to content

ADR-0022 — Delegation as tool

The unification trick. The single design choice that keeps the runtime small enough to fit in your head.

Multi-agent systems need a way for one agent to invoke another. “Triage” needs to delegate to “refund_decision”; “refund_decision” needs to delegate to “communication.” How does that work mechanically?

The instinct is to add a separate API: a delegate() function, maybe a Delegation type, a special control flow path in the runtime that branches on “is this a delegation or a tool call?” That’s two execution paths to maintain and two sets of failure modes to reason about.

This ADR settles a different answer: delegation IS a tool call. There is no separate delegation API. When agent A lists agent B as a sub-agent, the runtime synthesizes a tool named delegate_to_<B> whose input is the task spec for B. The runtime executes that tool the same way it executes any other tool — recursively invoking itself with B’s definition.

The unification has three downstream consequences that compound:

One execution path. The tool loop in ADR-0021 doesn’t need a “is this a delegation?” branch. Every iteration just asks: did the LLM stop, or did it ask for a tool? If a tool, run it. The fact that the tool happens to be a recursive runtime call is the toolResolver’s concern, not the loop’s.

One set of safety primitives. Budgets, allow-lists, iteration caps, error semantics — they all apply uniformly. A sub-agent’s time_budget_ms is just the parent tool call’s remaining time budget. A sub-agent’s allowed_tools is enforced the same way the parent’s was. There’s no “sub-agent budget” or “delegation depth limit” as separate concepts; everything is just nested tool calls with budget-aware composition.

One set of failure modes. A sub-agent that throws gets caught the same way a tool that throws gets caught. A sub-agent that exceeds budgets surfaces the same error type as any other budget exceedance. Debugging is uniform.

When an agent definition lists another agent as a sub-agent, the platform synthesizes a tool named delegate_to_<sub_agent_name> whose input is a structured task spec. The tool’s handler invokes the runtime recursively with the sub-agent’s definition.

// Auto-generated tool for an agent whose sub_agents list
// includes "refund_decision":
{
name: 'delegate_to_refund_decision',
description: '<from refund_decision YAML>',
inputSchema: TaskSpecSchema,
handler: async (input, ctx) => {
return await runtime.runTurn(
refundDecisionDefinition,
taskFromSpec(input),
ctx.contextBundleForSubAgent(),
);
},
}

The agent’s LLM sees delegate_to_refund_decision in its tool list. It calls it. The runtime executes it. The execution happens to recurse. The agent doesn’t know or care.

We considered three alternatives:

A separate delegate() function in the runtime. Forces a branch in the tool loop (“is this a delegation or a tool call?”). Doubles the code path. Doubles the testing surface. Three months later, every new feature has to ask “do we apply this to delegations or tools or both?”

A Delegation type the LLM emits explicitly. The LLM has to know about delegation as a first-class concept. Means the prompt has to teach the LLM the distinction. Adds words to every system prompt for no benefit.

Multi-agent orchestration above the runtime. Move delegation out of the runtime entirely; have the application layer parse agent reports and decide whether to invoke another agent. Possible, but loses the one safety property that matters: you can’t enforce budget composition across agents that don’t share an execution context.

  • Arbitrary delegation depth with budget-aware composition. Agent A delegates to B which delegates to C which delegates to D — each layer’s time_budget_ms is correctly composed as a remaining-budget. The platform enforces a configurable max-depth (default 3) to prevent runaway recursion.
  • The LLM picks the right sub-agent. The agent’s prompt says “you have a delegate_to_refund_decision tool that handles refund decisions.” The LLM treats it like any other tool — uses the description to decide when to call. No new prompting pattern.
  • Sub-agents are pluggable. Adding a new sub-agent is editing the YAML’s sub_agents list. The runtime synthesizes the new delegation tool at next turn.
  • The synthesized tool’s input schema must be expressive. A task spec needs instructions, payload, expected output, time budget, autonomy bounds. We picked a schema that handles Phase 1’s needs; complex multi-step delegation chains might push on it.
  • Delegation depth is global, not per-agent. A misconfigured agent that recurses into itself hits the global max-depth. Per-agent depth limits are a future refinement.
  • Per-tool error categorization applies. A sub-agent that fails for an AutonomyBoundaryError reason returns a tool_result with is_error=true. The parent agent can recover. This is sometimes the right behavior and sometimes not — a security-critical sub-agent failure probably should not be recoverable. Today’s answer is “code your sub-agent to throw a hard error type”; future ADRs might add a declarative way to mark some sub-agent failures as un-recoverable.

For the original ADR with full Context / Decision / Consequences / Alternatives sections, see ADR-0022 source.

Related decisions:

  • ADR-0021 — agent runtime + tool loop (the engine that executes the synthesized delegation tools)
  • ADR-0023 — tool registry (where synthesized delegation tools land alongside hand-registered ones)
  • ADR-0031 — YAML agent definition format (where the sub_agents list lives)