Skip to content

Architecture

The synthesis page. If Concepts explains what the platform’s primitives are and the package tour explains how each one is implemented, this page explains how the whole thing fits together.

Read this after you’ve absorbed the primitives. It maps the six-layer context model onto real Cloudflare resources, walks the path of an HTTP request from edge to LLM to event bus, and states what the platform’s threat model actually defends against.

The platform is one Cloudflare Worker plus five external services (four Cloudflare, one external each for LLM and embeddings). One deploy unit; six runtime resources.

WhatWhereWhat it does
Worker (apps/worker)Cloudflare edgeThe runtime. Three handler types: fetch (HTTP API), scheduled (cron), queue (event consumer)
AgentJob Durable ObjectCloudflare edgeRuns async agent jobs (/jobs) to completion via alarm-driven execution
JOBS_INDEX KV namespaceCloudflare edgeDiscoverable index of submitted async jobs
D1 (agent_platform)Cloudflare edgeSQL database; source of truth for long_term_memory rows
Vectorize (agent-platform-lt-memory)Cloudflare edgeVector index, 1536d cosine; semantic search for long-term memory
human-review queueCloudflare edgeAsync events for human escalation
shopify-actions queueCloudflare edgeAsync events for Shopify mutations
Anthropic (Sonnet / Haiku)External APILLM provider for agent reasoning
OpenAI (text-embedding-3-small)External APIEmbedding provider for semantic recall
Shopify Admin (GraphQL)External APIOrder / product / shop reads

Inbound: HTTP clients (curl, browser, Cloudflare cron) hit the Worker via /run (sync) or /jobs (async). Cron triggers come from Cloudflare directly, no HTTP, no auth.

Outbound: the Worker is both producer and consumer of the two queues — same code, different handler types. It calls Anthropic for every agent turn, OpenAI for every memory recall, and Shopify for order lookups.

Each connection is a resource binding declared in apps/worker/wrangler.toml. The Worker is both producer and consumer of the queues — same code, different handler types. See the Stack section for per-service depth.

The platform’s outermost split, from ADR-0005. Two layers, with a strict dependency direction: business packs depend on platform core, never the reverse.

LayerBusiness-aware?What’s in it (today)Examples
Platform CoreNoRuntime, context assembler, memory subsystem, tool registry, event bus, agent loader, model adapters, error taxonomypackages/core, packages/runtime, packages/memory, packages/event-bus, packages/agent-loader, packages/llm-anthropic, packages/embeddings-openai
Business PacksYesDomain-specific tools, integrations, agent definitionspackages/shopify, apps/worker/agents/*.yaml

The discipline: anything e-commerce-shaped in a Platform Core package is a code-review smell. Adding a new vertical (B2B SaaS, healthcare, fintech) is a new business pack — additive, zero edits to core.

The platform’s most distinctive design choice. Six strictly ordered layers, with the top two immutable. The full reasoning lives in ADR-0006; the six layers in summary:

PriorityLayerSourceMutability
1Core Context — system prompt, hard constraintsAgent definition (YAML)Immutable
2Characteristics — personality, decision styleAgent definition (YAML)Immutable
3Shared Context — date, tenant, environment flagsShared storeRead-only by agent
4Delegated Context — instructions, payloadParent agent’s task specPer-task
5Working Memory — session logCurrent turnEphemeral (sliding window)
6Long-term Memory — vector search resultsVectorize + D1Persistent (per-agent)

Higher priority wins on conflict. A retrieved long-term memory cannot override the agent’s hard constraints. A delegated task cannot override the agent’s identity. The runtime enforces this through validateNoOverride() on every context assembly.

This is the security boundary against prompt injection: an attacker whose text lands in tool output, stored memory, or a delegated task gets to influence layers 4–6 at most. Layers 1–2 are unreachable.

An HTTP request to /run goes through eight steps. The same flow applies to /jobs (just with a Durable Object alarm in front) and scheduled cron triggers (no HTTP at all; goes straight to agent invocation).

Setup (before the LLM loop):

  1. HTTP client → Worker: POST /run { agent_name, instructions, payload }
  2. Worker: look up the bundled AgentDefinition for the requested agent
  3. Worker → Runtime: runTurn(definition, task, context)
  4. Runtime: assemble the 6-layer context, run validateNoOverride()

The tool loop (repeats until stop_reason = end_turn or budget exceeded):

  1. Runtime → Anthropic: complete(request)
  2. Anthropic → Runtime: response with one of three outcomes:
    • recall_memory tool call: Runtime → ToolResolver → Memory gateway → embed query → Vectorize search → D1 hydrate → tool_result back to Runtime
    • emit_event tool call: Runtime → ToolResolver → Queue producer → send(topic, payload) → ack → tool_result back to Runtime
    • delegate_to_X tool call: Runtime invokes itself recursively with the sub-agent’s definition (this is what makes delegation-as-tool work)
    • end_turn stop reason: Runtime exits the loop

Wrap-up:

  1. Runtime → Worker: AgentReport { status, summary, tool_calls, cost }
  2. Worker → HTTP client: 200 OK { ...AgentReport }

A few things to notice in this flow:

  • The runtime is recursive. A delegate_to_X tool call invokes the runtime again with the sub-agent’s definition. No separate orchestration code path — the same loop, same budget enforcement, same error semantics. This is the unification trick from ADR-0022.
  • The agent never directly mutates the outside world. Even when the agent has decided to refund an order, it emits an event onto a queue. The queue consumer (a separate handler in the same Worker today) is what actually calls Shopify. This separation is what makes a future human-approval gate trivial to wire — the consumer simply waits.
  • All LLM and embedding calls go through provider-agnostic adapters. LLM in the diagram is concretely Anthropic today, but the runtime sees a ModelAdapter. Swapping providers is a one-package change.
  • Context assembly happens once per turn, not once per LLM call. The same six-layer bundle gets reused across multiple LLM iterations inside a single agent turn.

A specific case worth its own diagram: when one agent delegates to a sub-agent. This is the order-triage scenario in miniature.

The triage agent receives the request:

  1. User → triage: “Anna’s towel is defective. She wants a refund.”
  2. triage runs LLM turn 1, calls shopify_get_order_by_email to look up Anna’s recent orders
  3. triage runs LLM turn 2, decides to delegate, calls delegate_to_refund_decision

The runtime synthesizes the delegation tool from the sub-agent list. The tool’s handler invokes runTurn() recursively with the sub-agent’s definition.

The refund_decision sub-agent runs:

  1. triage → refund_decision: runTurn(refund_decision_def, sub_task)
  2. refund_decision gets a fresh 6-layer context (its own system prompt, its own characteristics, its own tool list)
  3. refund_decision → long-term memory: recall_memory("refund history for anna@...") returns 2 matches: prior valid refund, repeat order
  4. refund_decision runs LLM turn 1, reasons: clean record, valid reason → auto-approve
  5. refund_decision → shopify-actions queue: emit_event(shopify_actions, refund 49 SEK)
  6. refund_decision → triage: AgentReport { decision: "auto-approve", summary }

triage incorporates the report:

  1. triage runs LLM turn 3, summarizes the outcome
  2. triage → user: { status: "completed", summary: "..." }

A few observations:

  • refund_decision has its own context bundle. The parent’s context isn’t inherited; the sub-agent gets a fresh six-layer assembly with its own system prompt and characteristics. Only the delegated context (layer 4) carries data from the parent.
  • The sub-agent has its own tool list. refund_decision can call recall_memory (because its YAML has memory_config.long_term_enabled: true). triage can’t — not in its tool list.
  • Long-term memory is per-agent. The recall query is filtered structurally by agent_id = 'agent-refund-decision'; triage’s memories (if it had any) wouldn’t show up.
  • Events are emitted from the deepest agent that has the context to make the decision. refund_decision decides; it emits. triage could also have emitted an event, but in the current scenario it just summarizes.

What the platform defends against, and what it doesn’t.

Prompt injection via tool output, stored memory, or delegated task. The six-layer model is specifically designed for this. An attacker who lands malicious text in a memory entry can influence layers 5–6 at most; layers 1–2 (the agent’s identity and hard constraints) are unreachable.

Sub-agent runaway recursion. Each agent has an autonomy.max_delegation_depth configured. The runtime tracks delegation depth across the call chain and refuses to go deeper.

Budget exceedance. Time and cost budgets are enforced at iteration boundaries, not just per LLM call. A turn that would blow through its budget gets stopped and surfaces a TurnBudgetExceededError with the partial state captured in the report.

Cross-agent memory leakage. Long-term memory is structurally scoped by tenant_id and agent_id via Vectorize metadata indexes. The agent itself cannot specify these — the runtime composes them from the context bundle. A buggy or malicious agent cannot retrieve another agent’s or another tenant’s memories.

Cross-tenant queue leakage. Each event payload includes a tenant_id derived from the originating context. Consumers filter on it.

Tool whitelist violations. An agent can only call tools listed in its YAML’s tools.allowed. Even if the LLM tries to call a tool that exists in the registry but isn’t whitelisted for this agent, the runtime throws AutonomyBoundaryError before the tool runs.

What’s NOT defended (by design or by deferral)

Section titled “What’s NOT defended (by design or by deferral)”

LLM jailbreaks within allowed behavior. An agent’s system prompt says “don’t auto-approve refunds over $500.” A clever adversarial input might convince the LLM to approve a $499.99 refund that it shouldn’t. The platform can’t prevent within-the-allowed-policy errors; the policy itself has to be robust. Approval gates (Phase 2) move the policy enforcement out of the LLM and into a deterministic check.

Shopify-side side effects without idempotency. Today’s shopify_actions queue consumer is logs-only, so this doesn’t matter yet. Phase 2 introduces real mutations and idempotency keys per event.id so a delivered-twice event doesn’t refund twice.

Resource exhaustion attacks (DoS). A flood of /run requests from a hostile client would cost real money in LLM calls before being rate-limited. Cloudflare’s edge rate limits sit in front; per-tenant rate limits in the platform are a Phase 4 concern.

Sub-agents that share an LLM provider have a shared blast radius. If Anthropic has an outage, every agent that uses Anthropic stops working. The ModelAdapter abstraction lets us fail over to a different provider, but we don’t do that automatically.

Memory poisoning by an authorized writer. A compromised agent (one whose YAML or environment is subverted) can write malicious entries into long-term memory that get retrieved by itself or by other agents in its tenant. The defense is at the supply-chain level — only trusted operators publish agent YAMLs, only trusted code runs in the Worker.

Operational principles, not architectural choices, but they shape the implementation:

The platform doesn’t store credit cards, full SSNs, or full passport numbers. Tools that interact with payment systems (Phase 2) will reference tokenized identifiers, not raw values. Worker secrets carry API keys; nothing else.

The platform doesn’t log secrets. The @agent-platform/logger package has a redaction pass that masks any field name matching api_key, authorization, password, token, secret. The redaction is opt-out — the default is “if it looks like a secret, redact it.”

The platform doesn’t trust unauthenticated input. The HTTP API requires a Bearer token on every endpoint except /health. Cron triggers come from Cloudflare directly (no HTTP, no header check needed; the source is structural).

The platform doesn’t write to systems it doesn’t own. Phase 1’s only write paths are to Cloudflare D1 + Vectorize (internal) and to log streams. Shopify mutations are queued, not executed (Phase 2). External systems are read-only.

Phase 1’s measured numbers, projected forward:

DimensionTodayAt 10K agent runs/dayAt 1M agent runs/day
LLM cost~$0.05/run (Sonnet+Haiku)~$500/day~$50K/day
Embedding cost~$0.0000004/recallnegligible~$5/day
D1 read ops~5/run50K/day (free tier covers 5M)5M/day (free tier covers; bursty)
Vectorize queries~1/recall10K/day1M/day
Worker requests1/run sync, 2/run asyncwell under free tier$5-50 above free tier
D1 storage~10 KB/agent definition + memory rowsKB/MB scaleGB scale
Vectorize storage1536 dim × ~100 entries today1.5M stored dimensions1.5B stored dimensions ($75/mo)

Phase 1 doesn’t approach any of these limits. The shapes matter for Phase 4 multi-tenant; the early signal is that Cloudflare’s free tiers cover surprising scale, and the dominant cost at every horizon is LLM calls.

If you’re still ramping:

  • Testing for what’s tested, what’s not, and the testing-fidelity gap we hit twice in Phase 1
  • Scenarios (commits 5–7) for end-to-end walkthroughs of order-triage, merchandising, and the B2B SaaS hypothetical that uses every feature

If you want decision-level depth: