Architecture
The synthesis page. If Concepts explains what the platform’s primitives are and the package tour explains how each one is implemented, this page explains how the whole thing fits together.
Read this after you’ve absorbed the primitives. It maps the six-layer context model onto real Cloudflare resources, walks the path of an HTTP request from edge to LLM to event bus, and states what the platform’s threat model actually defends against.
What runs where
Section titled “What runs where”The platform is one Cloudflare Worker plus five external services (four Cloudflare, one external each for LLM and embeddings). One deploy unit; six runtime resources.
| What | Where | What it does |
|---|---|---|
Worker (apps/worker) | Cloudflare edge | The runtime. Three handler types: fetch (HTTP API), scheduled (cron), queue (event consumer) |
| AgentJob Durable Object | Cloudflare edge | Runs async agent jobs (/jobs) to completion via alarm-driven execution |
| JOBS_INDEX KV namespace | Cloudflare edge | Discoverable index of submitted async jobs |
D1 (agent_platform) | Cloudflare edge | SQL database; source of truth for long_term_memory rows |
Vectorize (agent-platform-lt-memory) | Cloudflare edge | Vector index, 1536d cosine; semantic search for long-term memory |
human-review queue | Cloudflare edge | Async events for human escalation |
shopify-actions queue | Cloudflare edge | Async events for Shopify mutations |
| Anthropic (Sonnet / Haiku) | External API | LLM provider for agent reasoning |
OpenAI (text-embedding-3-small) | External API | Embedding provider for semantic recall |
| Shopify Admin (GraphQL) | External API | Order / product / shop reads |
Inbound: HTTP clients (curl, browser, Cloudflare cron) hit the Worker via /run (sync) or /jobs (async). Cron triggers come from Cloudflare directly, no HTTP, no auth.
Outbound: the Worker is both producer and consumer of the two queues — same code, different handler types. It calls Anthropic for every agent turn, OpenAI for every memory recall, and Shopify for order lookups.
Each connection is a resource binding declared in
apps/worker/wrangler.toml. The Worker is both producer and
consumer of the queues — same code, different handler types.
See the Stack section for per-service depth.
The two layers
Section titled “The two layers”The platform’s outermost split, from ADR-0005. Two layers, with a strict dependency direction: business packs depend on platform core, never the reverse.
| Layer | Business-aware? | What’s in it (today) | Examples |
|---|---|---|---|
| Platform Core | No | Runtime, context assembler, memory subsystem, tool registry, event bus, agent loader, model adapters, error taxonomy | packages/core, packages/runtime, packages/memory, packages/event-bus, packages/agent-loader, packages/llm-anthropic, packages/embeddings-openai |
| Business Packs | Yes | Domain-specific tools, integrations, agent definitions | packages/shopify, apps/worker/agents/*.yaml |
The discipline: anything e-commerce-shaped in a Platform Core package is a code-review smell. Adding a new vertical (B2B SaaS, healthcare, fintech) is a new business pack — additive, zero edits to core.
The six context layers
Section titled “The six context layers”The platform’s most distinctive design choice. Six strictly ordered layers, with the top two immutable. The full reasoning lives in ADR-0006; the six layers in summary:
| Priority | Layer | Source | Mutability |
|---|---|---|---|
| 1 | Core Context — system prompt, hard constraints | Agent definition (YAML) | Immutable |
| 2 | Characteristics — personality, decision style | Agent definition (YAML) | Immutable |
| 3 | Shared Context — date, tenant, environment flags | Shared store | Read-only by agent |
| 4 | Delegated Context — instructions, payload | Parent agent’s task spec | Per-task |
| 5 | Working Memory — session log | Current turn | Ephemeral (sliding window) |
| 6 | Long-term Memory — vector search results | Vectorize + D1 | Persistent (per-agent) |
Higher priority wins on conflict. A retrieved long-term
memory cannot override the agent’s hard constraints. A delegated
task cannot override the agent’s identity. The runtime enforces
this through validateNoOverride() on every context assembly.
This is the security boundary against prompt injection: an attacker whose text lands in tool output, stored memory, or a delegated task gets to influence layers 4–6 at most. Layers 1–2 are unreachable.
How a request flows
Section titled “How a request flows”An HTTP request to /run goes through eight steps. The same flow
applies to /jobs (just with a Durable Object alarm in front)
and scheduled cron triggers (no HTTP at all; goes straight to
agent invocation).
Setup (before the LLM loop):
- HTTP client → Worker:
POST /run { agent_name, instructions, payload } - Worker: look up the bundled
AgentDefinitionfor the requested agent - Worker → Runtime:
runTurn(definition, task, context) - Runtime: assemble the 6-layer context, run
validateNoOverride()
The tool loop (repeats until stop_reason = end_turn or budget exceeded):
- Runtime → Anthropic:
complete(request) - Anthropic → Runtime: response with one of three outcomes:
recall_memorytool call: Runtime → ToolResolver → Memory gateway → embed query → Vectorize search → D1 hydrate →tool_resultback to Runtimeemit_eventtool call: Runtime → ToolResolver → Queue producer →send(topic, payload)→ ack →tool_resultback to Runtimedelegate_to_Xtool call: Runtime invokes itself recursively with the sub-agent’s definition (this is what makes delegation-as-tool work)end_turnstop reason: Runtime exits the loop
Wrap-up:
- Runtime → Worker:
AgentReport { status, summary, tool_calls, cost } - Worker → HTTP client:
200 OK { ...AgentReport }
A few things to notice in this flow:
- The runtime is recursive. A
delegate_to_Xtool call invokes the runtime again with the sub-agent’s definition. No separate orchestration code path — the same loop, same budget enforcement, same error semantics. This is the unification trick from ADR-0022. - The agent never directly mutates the outside world. Even when the agent has decided to refund an order, it emits an event onto a queue. The queue consumer (a separate handler in the same Worker today) is what actually calls Shopify. This separation is what makes a future human-approval gate trivial to wire — the consumer simply waits.
- All LLM and embedding calls go through provider-agnostic
adapters.
LLMin the diagram is concretely Anthropic today, but the runtime sees aModelAdapter. Swapping providers is a one-package change. - Context assembly happens once per turn, not once per LLM call. The same six-layer bundle gets reused across multiple LLM iterations inside a single agent turn.
How delegation flows
Section titled “How delegation flows”A specific case worth its own diagram: when one agent delegates to a sub-agent. This is the order-triage scenario in miniature.
The triage agent receives the request:
- User → triage: “Anna’s towel is defective. She wants a refund.”
- triage runs LLM turn 1, calls
shopify_get_order_by_emailto look up Anna’s recent orders - triage runs LLM turn 2, decides to delegate, calls
delegate_to_refund_decision
The runtime synthesizes the delegation tool from the sub-agent list. The tool’s handler invokes runTurn() recursively with the sub-agent’s definition.
The refund_decision sub-agent runs:
- triage → refund_decision:
runTurn(refund_decision_def, sub_task) - refund_decision gets a fresh 6-layer context (its own system prompt, its own characteristics, its own tool list)
- refund_decision → long-term memory:
recall_memory("refund history for anna@...")returns 2 matches: prior valid refund, repeat order - refund_decision runs LLM turn 1, reasons: clean record, valid reason → auto-approve
- refund_decision →
shopify-actionsqueue:emit_event(shopify_actions, refund 49 SEK) - refund_decision → triage:
AgentReport { decision: "auto-approve", summary }
triage incorporates the report:
- triage runs LLM turn 3, summarizes the outcome
- triage → user:
{ status: "completed", summary: "..." }
A few observations:
refund_decisionhas its own context bundle. The parent’s context isn’t inherited; the sub-agent gets a fresh six-layer assembly with its own system prompt and characteristics. Only the delegated context (layer 4) carries data from the parent.- The sub-agent has its own tool list.
refund_decisioncan callrecall_memory(because its YAML hasmemory_config.long_term_enabled: true).triagecan’t — not in its tool list. - Long-term memory is per-agent. The recall query is
filtered structurally by
agent_id = 'agent-refund-decision'; triage’s memories (if it had any) wouldn’t show up. - Events are emitted from the deepest agent that has the
context to make the decision.
refund_decisiondecides; it emits.triagecould also have emitted an event, but in the current scenario it just summarizes.
Threat model
Section titled “Threat model”What the platform defends against, and what it doesn’t.
What’s defended
Section titled “What’s defended”Prompt injection via tool output, stored memory, or delegated task. The six-layer model is specifically designed for this. An attacker who lands malicious text in a memory entry can influence layers 5–6 at most; layers 1–2 (the agent’s identity and hard constraints) are unreachable.
Sub-agent runaway recursion. Each agent has an
autonomy.max_delegation_depth configured. The runtime tracks
delegation depth across the call chain and refuses to go deeper.
Budget exceedance. Time and cost budgets are enforced at
iteration boundaries, not just per LLM call. A turn that
would blow through its budget gets stopped and surfaces a
TurnBudgetExceededError with the partial state captured in
the report.
Cross-agent memory leakage. Long-term memory is
structurally scoped by tenant_id and agent_id via Vectorize
metadata indexes. The agent itself cannot specify these — the
runtime composes them from the context bundle. A buggy or
malicious agent cannot retrieve another agent’s or another
tenant’s memories.
Cross-tenant queue leakage. Each event payload includes a
tenant_id derived from the originating context. Consumers
filter on it.
Tool whitelist violations. An agent can only call tools
listed in its YAML’s tools.allowed. Even if the LLM tries
to call a tool that exists in the registry but isn’t whitelisted
for this agent, the runtime throws AutonomyBoundaryError
before the tool runs.
What’s NOT defended (by design or by deferral)
Section titled “What’s NOT defended (by design or by deferral)”LLM jailbreaks within allowed behavior. An agent’s system prompt says “don’t auto-approve refunds over $500.” A clever adversarial input might convince the LLM to approve a $499.99 refund that it shouldn’t. The platform can’t prevent within-the-allowed-policy errors; the policy itself has to be robust. Approval gates (Phase 2) move the policy enforcement out of the LLM and into a deterministic check.
Shopify-side side effects without idempotency. Today’s
shopify_actions queue consumer is logs-only, so this doesn’t
matter yet. Phase 2 introduces real mutations and idempotency
keys per event.id so a delivered-twice event doesn’t refund
twice.
Resource exhaustion attacks (DoS). A flood of /run
requests from a hostile client would cost real money in LLM
calls before being rate-limited. Cloudflare’s edge rate limits
sit in front; per-tenant rate limits in the platform are a
Phase 4 concern.
Sub-agents that share an LLM provider have a shared blast
radius. If Anthropic has an outage, every agent that uses
Anthropic stops working. The ModelAdapter abstraction lets
us fail over to a different provider, but we don’t do that
automatically.
Memory poisoning by an authorized writer. A compromised agent (one whose YAML or environment is subverted) can write malicious entries into long-term memory that get retrieved by itself or by other agents in its tenant. The defense is at the supply-chain level — only trusted operators publish agent YAMLs, only trusted code runs in the Worker.
Boundaries we don’t cross
Section titled “Boundaries we don’t cross”Operational principles, not architectural choices, but they shape the implementation:
The platform doesn’t store credit cards, full SSNs, or full passport numbers. Tools that interact with payment systems (Phase 2) will reference tokenized identifiers, not raw values. Worker secrets carry API keys; nothing else.
The platform doesn’t log secrets. The
@agent-platform/logger package has a redaction pass that
masks any field name matching api_key, authorization,
password, token, secret. The redaction is opt-out — the
default is “if it looks like a secret, redact it.”
The platform doesn’t trust unauthenticated input. The HTTP
API requires a Bearer token on every endpoint except /health.
Cron triggers come from Cloudflare directly (no HTTP, no
header check needed; the source is structural).
The platform doesn’t write to systems it doesn’t own. Phase 1’s only write paths are to Cloudflare D1 + Vectorize (internal) and to log streams. Shopify mutations are queued, not executed (Phase 2). External systems are read-only.
What this looks like at scale
Section titled “What this looks like at scale”Phase 1’s measured numbers, projected forward:
| Dimension | Today | At 10K agent runs/day | At 1M agent runs/day |
|---|---|---|---|
| LLM cost | ~$0.05/run (Sonnet+Haiku) | ~$500/day | ~$50K/day |
| Embedding cost | ~$0.0000004/recall | negligible | ~$5/day |
| D1 read ops | ~5/run | 50K/day (free tier covers 5M) | 5M/day (free tier covers; bursty) |
| Vectorize queries | ~1/recall | 10K/day | 1M/day |
| Worker requests | 1/run sync, 2/run async | well under free tier | $5-50 above free tier |
| D1 storage | ~10 KB/agent definition + memory rows | KB/MB scale | GB scale |
| Vectorize storage | 1536 dim × ~100 entries today | 1.5M stored dimensions | 1.5B stored dimensions ($75/mo) |
Phase 1 doesn’t approach any of these limits. The shapes matter for Phase 4 multi-tenant; the early signal is that Cloudflare’s free tiers cover surprising scale, and the dominant cost at every horizon is LLM calls.
Where to next
Section titled “Where to next”If you’re still ramping:
- Testing for what’s tested, what’s not, and the testing-fidelity gap we hit twice in Phase 1
- Scenarios (commits 5–7) for end-to-end walkthroughs of order-triage, merchandising, and the B2B SaaS hypothetical that uses every feature
If you want decision-level depth: