Skip to content

ADR-0031: YAML agent definition format

Status: Accepted Date: 2026-05-02 Related:

  • ADR-0006 (six-layer context model; agent definitions own Layers 1 and 2)
  • ADR-0023 (tools are referenced by name; loose binding from agent definition)

Agent definitions in @agent-platform/core are typed as AgentDefinition — a frozen, validated structure containing metadata, core context, characteristics, tool refs, sub-agent refs, memory config, autonomy boundaries, escalation rules, and a model tier. The shape is locked.

Until now, every agent definition has been authored in TypeScript: apps/example/src/agents.ts, apps/worker/src/agents.ts, apps/worker/src/merchandising.ts. This works for the platform’s own scaffolding, but it isn’t the platform vision. The vision is agents are data: editable by operators, manageable in version control, and (eventually) configurable through a UI without recompiling the worker.

This ADR locks the on-disk format and the loader contract that produces validated AgentDefinition instances from that format.

Agent definitions live as YAML files on disk. A new @agent-platform/agent-loader package reads, validates, and resolves them into AgentDefinition instances at load time (worker startup, or build time for bundled deployments).

Format choices (the seven things deliberated)

Section titled “Format choices (the seven things deliberated)”

1. File layout: one YAML file per agent.

agents/triage.yaml, agents/refund-decision.yaml, etc. Multi-agent files were considered and rejected: they invite ordering games (one agent’s escalation rule references a sibling agent in the same file), tangle PR review (a one-line agent edit is buried in a long file), and don’t map cleanly to a future UI’s “list of agents → click one to edit” flow.

The cost is that “topology” — which agents work together in a deployment — must be declared somewhere else. Today this is implicit (the worker’s agent host loads the directory and registers everything). When the platform supports multiple deployments per worker, an agent-set.yaml topology manifest will land. Deferred.

2. System prompt: schema accepts string OR { file: <path> }.

core_context:
system_prompt: "Inline prompt for short cases."
# OR:
system_prompt:
file: ./prompts/triage.md

Production system prompts are often 100+ lines. Editing them inside YAML’s quoting rules is painful; markdown editors give syntax highlighting; PR diffs of prompt edits read cleanly when the prompt is its own file. But forcing a separate file for every two-line sub-agent is overkill. Both forms are supported; the loader resolves the file form to a string before AgentDefinitionSchema validates.

3. Schema validation: Zod.

Consistent with the rest of the codebase. The existing AgentDefinitionSchema in @agent-platform/schemas is the validation target — the loader doesn’t add new schemas, it adapts YAML input to that schema’s shape.

JSON Schema was considered for its tooling ecosystem. Rejected: we don’t expose YAML schemas externally; internal consistency wins. Zod → JSON Schema conversion is available if/when we need it (third-party agent definitions, schema-driven UI form generators).

4. Tool references: loose, by name string.

tools:
- shopify_get_order_by_email
- send_email
- emit_event

Tools are inherently dynamic (MCP tools load at runtime; built-in tools are name-resolved against ToolRegistry). Tight typed references would couple agent files to TypeScript imports — making agents code, not data. The loader validates names against the known tool list at load time; typos fail loudly at startup, not at build time.

5. Sub-agent references: loose, by name string.

Same reasoning. A future “compile” step could verify the sub-agent graph is acyclic; deferred.

6. Versioning: apiVersion: agent-platform/v1 mandatory.

apiVersion: agent-platform/v1
kind: Agent

K8s-style envelope. Two purposes:

  • Per-agent versioning (metadata.version: 0.1.0) is operator bookkeeping.
  • Platform apiVersion is loader bookkeeping. When v2 lands, the loader can support v1 and v2 simultaneously for a deprecation window. Old agents keep working; new ones use the new shape.

The cost is one extra mandatory line per file. Trivial. The benefit is a clean upgrade path that doesn’t exist if agents declare nothing about which schema version they target.

7. File location: loader code in packages/agent-loader/; YAML files in apps/worker/agents/.

Clean separation:

  • packages/* is reusable code (shared infrastructure)
  • apps/* owns its config (deploy-time artifacts)

Multiple apps can use the same loader against different agent directories. No cross-app coupling.

Two-layer API:

Primitive (runtime-agnostic):

loadAgentFromString(yamlSource: string, opts?: { reader?: FileReader; source?: string }): Promise<AgentDefinition>

Works in Worker, Node, tests. Caller supplies YAML as a string. Optional reader resolves system_prompt: { file: ... } references; optional source is an identifier for clear error messages.

Node convenience (uses fs/promises):

loadAgentFromFile(filePath: string): Promise<AgentDefinition>
loadAgentsFromDirectory(dirPath: string): Promise<Map<string, AgentDefinition>>

loadAgentFromFile reads one file, auto-builds a reader that resolves prompt files relative to the YAML’s directory.

loadAgentsFromDirectory walks a directory (non-recursive), loads every *.yaml/*.yml, returns a Map keyed by metadata.name. Duplicate names across files = fatal error.

The Worker has no fs at runtime. YAML files and prompt files are bundled at build time (esbuild text loader, wrangler text_blobs, or raw import of .yaml strings — to be locked in a future commit when the scenario lands). The loader’s loadAgentFromString works in the Worker because it doesn’t touch disk; bundled strings feed in directly with a custom reader.

  • Hot reload. Agents load once at startup. Reloading at runtime is a security hole (untrusted YAML mutates running agents) and a complexity vector. Not in v1.
  • Runtime YAML loading from R2/KV. Useful for multi-tenant where operators upload agents through a UI. Build-time bundling is enough for v1. Defer until needed.
  • Topology / agent-set manifests. When we have multiple deployments, this becomes a real concern. Defer until then.
  • Migration of existing hardcoded agents. The apps/example/src/agents.ts and apps/worker/src/agents.ts definitions stay. Migration is a separate, opt-in commit.

All loader errors wrap as ConfigError (the existing class in @agent-platform/errors) with file path context. Operators see “couldn’t load agents/triage.yaml: invalid envelope — apiVersion required” instead of a raw ZodError stack. The schema’s .issues array is preserved in ConfigError.context for tooling that wants structured access.

Positive:

  • Agent definitions are now data. Operators can edit YAML without TypeScript knowledge.
  • Schema is deterministic — what loads is exactly what AgentDefinition says it is.
  • File-per-agent layout maps cleanly to future tooling (UI, validation tools, git diffs).
  • ApiVersion envelope means future schema evolution doesn’t break old files.
  • Loose tool/sub-agent references decouple agent files from build-time TypeScript.

Negative:

  • One more file format to learn (though YAML is widely known).
  • Operators can author syntactically valid YAML that fails schema validation. Loud errors at startup mitigate this; future YAML-LSP schema integration would catch it earlier.
  • Worker bundling of YAML adds a small build-time complexity. Acceptable.
  • The loose tool name approach means typos surface at agent load time, not build time. Deemed acceptable; load happens at startup, errors are fatal and loud.

@agent-platform/agent-loader package shipped 2026-05-02:

  • src/envelope.ts — apiVersion + kind validation
  • src/resolve.ts — system_prompt file ref resolution
  • src/loader.ts — main pipeline (parse → envelope → resolve → schema validate)
  • src/directory.ts — Node convenience: file + directory loaders
  • src/index.ts — public exports

~30 tests covering the full pipeline, edge cases, and error handling.

  • YAML LSP / schema in editors. A .schema.json published in the repo would let editors validate agent YAML on save. Future work; not blocking.
  • Topology files. When the platform supports multiple deployments per worker, a separate topology format will be needed. The shape isn’t decided yet.