Skip to content

Core concepts

The platform is built around four primitives. Once you understand them, every example, every scenario, every architectural choice makes sense as variations on these four ideas.

An agent is a definition file that says who an LLM-powered actor is and what it’s allowed to do. In the platform, agent definitions are YAML — one file per agent.

Every agent has:

  • An identity. A name, a role (main for top-level agents that receive tasks; sub_agent for ones that only get delegated to), and a model tier (main agents typically run on Claude Sonnet; sub-agents on Haiku to save cost).
  • A system prompt. The persistent instructions that define how the agent thinks. This is where business expertise gets encoded: tone, decision criteria, escalation policy, what counts as “good” work for this agent’s role.
  • A characteristics block. Personality, decision style, tone. Slightly fuzzier than the system prompt but reaches the model the same way; mostly useful for keeping agent voice consistent across responses.
  • A tools list. The names of tools this agent is permitted to use. Even if a tool exists in the platform, an agent can only call it if it’s listed here. This is structural, not advisory — the runtime enforces it.
  • A sub-agents list. Other agents this agent can delegate to. Delegation is a tool call: when triage decides it needs the refund-decision agent’s judgment, it calls a delegate_to_refund_decision tool with structured inputs, and the runtime spins up the sub-agent with its own context, runs its own loop, and returns a structured report.
  • A memory configuration. Whether long-term memory is enabled for this agent (most aren’t — only agents that benefit from persistent recall, like refund_decision, opt in), and what the working-memory window size is.
  • Autonomy bounds. How deep the delegation chain can go from this agent (so a buggy agent can’t recurse forever). What human approvals are required before certain actions can complete (Phase 2 territory).

That’s the entire agent. No code, no class hierarchy, no orchestration logic. The runtime reads the YAML, assembles the right context, and runs the loop.

A tool is one specific thing an agent can do. The runtime defines the interface: a tool has a name, a description (the LLM reads this to decide whether to call it), an input schema (Zod-validated), and a handler function that does the work and returns structured output.

Three categories of tools exist on the platform:

  • Built-in tools that every agent can opt into: recall_memory (semantic search over the agent’s long-term memory), store_memory (write a new entry), and emit_event (publish an event onto the bus for downstream handling).
  • Business-pack tools that come from a vertical-specific package. In the e-commerce pack, shopify_get_order_by_email looks up recent orders for a customer. Future Phase 2 tools will mutate Shopify (refund, cancel, annotate) once the human-approval gate is designed.
  • Delegation tools that are auto-generated from each agent’s sub_agents list. If triage lists refund_decision as a sub-agent, the runtime synthesizes a delegate_to_refund_decision tool whose input is the task spec for the sub-agent.

Tools are the security boundary. An agent can only do what its tools permit. Adding a new capability is shipping a new tool — not editing the agent.

Memory is the platform’s hardest, most distinctive idea. Every agent reads from up to six layers of context, in strict priority order. Higher layers can never be overridden by lower ones. The runtime enforces this.

LayerWhat it isMutability
1. Core contextSystem prompt + hard constraints from the YAMLImmutable. Compiled once, frozen.
2. CharacteristicsPersonality, decision style, toneImmutable. Same.
3. Shared contextRead-only data shared across agents (e.g., today’s date, the current tenant)Read-only. Set per request.
4. Delegated contextPer-task input from a parent agent (the sub-agent’s instructions, payload, expected output schema)Per-task. Set when delegated.
5. Working memoryThe current conversation: messages, tool results, intermediate reasoningSliding window per turn.
6. Long-term memoryPersistent vector search per agent, across all past turnsRead-write across turns; opt-in per agent.

The priority order is the security model. A hostile or buggy sub-agent cannot override the core context of its parent. A user input cannot override the agent’s system prompt. A retrieved memory cannot override the agent’s hard constraints. The platform calls this the validateNoOverride() guarantee, and the runtime applies it on every context assembly.

Long-term memory specifically is implemented as Cloudflare Vectorize (vector search for semantic recall) plus Cloudflare D1 (the source- of-truth row store). When an agent calls recall_memory("refund history for sara@example.com"), the runtime embeds the query, searches Vectorize for the top matches, hydrates the full content from D1, and hands the results back to the agent as a structured tool result. The agent then reasons over what it found.

This is why agents remember — not because the model has long context windows, but because the platform persists the right things and surfaces them on demand.

An agent never directly mutates the outside world. When refund_decision decides Sara’s refund should be auto-approved, it doesn’t call Shopify’s refund API. It emits an event:

topic: shopify_actions
payload: {
action_type: "refund",
order_id: "gid://shopify/Order/12345",
amount: "49.00",
currency_code: "SEK",
reason: "Auto-approve: under $50, first refund, within 30 days",
decided_by_agent: "agent-refund-decision"
}

That event lands in a Cloudflare Queue. A separate consumer (today logs-only; Phase 2 will execute the mutation) picks it up, validates the payload, and acts on it.

This separation is the platform’s safety property. The agent recommends; the consumer acts. A human approval gate fits naturally between them: the consumer can route certain events to a human-review queue, wait for approval, and only then execute. The agent’s reasoning is captured for audit; the action’s authorization is captured separately.

Two topics exist today:

  • human_review — events emitted when an agent decides a case needs human attention. Today’s consumer just logs them; Phase 2 introduces a real review UI.
  • shopify_actions — events emitted when an agent decides something should happen in Shopify. Same logs-only treatment in v1; Phase 2 wires real mutations.

A single triage scenario weaves all four:

  1. An HTTP request hits /run with { agent_name: "triage", instructions: "...", payload: {...} }. The runtime loads the triage agent from YAML.
  2. The agent’s system prompt and characteristics (layers 1+2) are baked into its context.
  3. Shared context (current date, etc.) and delegated context (the request’s instructions and payload) get added (layers 3+4).
  4. Working memory starts empty (layer 5).
  5. Long-term memory isn’t enabled for triage itself — it’s enabled for refund_decision (layer 6 only matters when the chain reaches that sub-agent).
  6. The triage agent’s LLM turn runs. It sees its allowed tools: shopify_get_order_by_email, emit_event, and the auto-generated delegate_to_refund_decision.
  7. It calls shopify_get_order_by_email to look up the customer’s recent orders.
  8. It then calls delegate_to_refund_decision with a structured task. The runtime spins up refund_decision as a sub-agent with its OWN six-layer context (its own system prompt, its own tools, its own long-term memory).
  9. refund_decision calls recall_memory against its long-term memory store. The runtime embeds the query, searches Vectorize, hydrates from D1, returns the matches.
  10. refund_decision reasons, then either calls delegate_to_communication (delegate further), or emit_event (publish to the bus), or both. It returns a structured report to its parent.
  11. triage sees the report, may emit its own events, and returns a final summary to the caller.

Every step uses one or more of the four primitives. There is no fifth primitive. Once you have these, you have multi-agent business automation.

  • Glossary — quick reference for terms used across the docs
  • Tech section (commit 2) — how each of these primitives is implemented in code
  • Scenarios section (commits 5–7) — concrete walkthroughs that build intuition for the patterns