Architecture

The synthesis page. If Concepts explains what the platform’s primitives are and the package tour explains how each one is implemented, this page explains how the whole thing fits together.

Read this after you’ve absorbed the primitives. It maps the six-layer context model onto real Cloudflare resources, walks the path of an HTTP request from edge to LLM to event bus, and states what the platform’s threat model actually defends against.

What runs where

The platform is one Cloudflare Worker plus five external services (four Cloudflare, one external each for LLM and embeddings). One deploy unit; six runtime resources.

What	Where	What it does
Worker (`apps/worker`)	Cloudflare edge	The runtime. Three handler types: `fetch` (HTTP API), `scheduled` (cron), `queue` (event consumer)
AgentJob Durable Object	Cloudflare edge	Runs async agent jobs (`/jobs`) to completion via alarm-driven execution
JOBS_INDEX KV namespace	Cloudflare edge	Discoverable index of submitted async jobs
D1 (`agent_platform`)	Cloudflare edge	SQL database; source of truth for `long_term_memory` rows
Vectorize (`agent-platform-lt-memory`)	Cloudflare edge	Vector index, 1536d cosine; semantic search for long-term memory
`human-review` queue	Cloudflare edge	Async events for human escalation
`shopify-actions` queue	Cloudflare edge	Async events for Shopify mutations
Anthropic (Sonnet / Haiku)	External API	LLM provider for agent reasoning
OpenAI (`text-embedding-3-small`)	External API	Embedding provider for semantic recall
Shopify Admin (GraphQL)	External API	Order / product / shop reads

Inbound: HTTP clients (curl, browser, Cloudflare cron) hit the Worker via /run (sync) or /jobs (async). Cron triggers come from Cloudflare directly, no HTTP, no auth.

Outbound: the Worker is both producer and consumer of the two queues — same code, different handler types. It calls Anthropic for every agent turn, OpenAI for every memory recall, and Shopify for order lookups.

Each connection is a resource binding declared in apps/worker/wrangler.toml. The Worker is both producer and consumer of the queues — same code, different handler types. See the Stack section for per-service depth.

The two layers

The platform’s outermost split, from ADR-0005. Two layers, with a strict dependency direction: business packs depend on platform core, never the reverse.

Layer	Business-aware?	What’s in it (today)	Examples
Platform Core	No	Runtime, context assembler, memory subsystem, tool registry, event bus, agent loader, model adapters, error taxonomy	`packages/core`, `packages/runtime`, `packages/memory`, `packages/event-bus`, `packages/agent-loader`, `packages/llm-anthropic`, `packages/embeddings-openai`
Business Packs	Yes	Domain-specific tools, integrations, agent definitions	`packages/shopify`, `apps/worker/agents/*.yaml`

The discipline: anything e-commerce-shaped in a Platform Core package is a code-review smell. Adding a new vertical (B2B SaaS, healthcare, fintech) is a new business pack — additive, zero edits to core.

The six context layers

The platform’s most distinctive design choice. Six strictly ordered layers, with the top two immutable. The full reasoning lives in ADR-0006; the six layers in summary:

Priority	Layer	Source	Mutability
1	Core Context — system prompt, hard constraints	Agent definition (YAML)	Immutable
2	Characteristics — personality, decision style	Agent definition (YAML)	Immutable
3	Shared Context — date, tenant, environment flags	Shared store	Read-only by agent
4	Delegated Context — instructions, payload	Parent agent’s task spec	Per-task
5	Working Memory — session log	Current turn	Ephemeral (sliding window)
6	Long-term Memory — vector search results	Vectorize + D1	Persistent (per-agent)

Higher priority wins on conflict. A retrieved long-term memory cannot override the agent’s hard constraints. A delegated task cannot override the agent’s identity. The runtime enforces this through validateNoOverride() on every context assembly.

This is the security boundary against prompt injection: an attacker whose text lands in tool output, stored memory, or a delegated task gets to influence layers 4–6 at most. Layers 1–2 are unreachable.

How a request flows

An HTTP request to /run goes through eight steps. The same flow applies to /jobs (just with a Durable Object alarm in front) and scheduled cron triggers (no HTTP at all; goes straight to agent invocation).

Setup (before the LLM loop):

HTTP client → Worker: POST /run { agent_name, instructions, payload }
Worker: look up the bundled AgentDefinition for the requested agent
Worker → Runtime: runTurn(definition, task, context)
Runtime: assemble the 6-layer context, run validateNoOverride()

The tool loop (repeats until stop_reason = end_turn or budget exceeded):

Runtime → Anthropic: complete(request)
Anthropic → Runtime: response with one of three outcomes:
- recall_memory tool call: Runtime → ToolResolver → Memory gateway → embed query → Vectorize search → D1 hydrate → tool_result back to Runtime
- emit_event tool call: Runtime → ToolResolver → Queue producer → send(topic, payload) → ack → tool_result back to Runtime
- delegate_to_X tool call: Runtime invokes itself recursively with the sub-agent’s definition (this is what makes delegation-as-tool work)
- end_turn stop reason: Runtime exits the loop

Wrap-up:

Runtime → Worker: AgentReport { status, summary, tool_calls, cost }
Worker → HTTP client: 200 OK { ...AgentReport }

A few things to notice in this flow:

The runtime is recursive. A delegate_to_X tool call invokes the runtime again with the sub-agent’s definition. No separate orchestration code path — the same loop, same budget enforcement, same error semantics. This is the unification trick from ADR-0022.
The agent never directly mutates the outside world. Even when the agent has decided to refund an order, it emits an event onto a queue. The queue consumer (a separate handler in the same Worker today) is what actually calls Shopify. This separation is what makes a future human-approval gate trivial to wire — the consumer simply waits.
All LLM and embedding calls go through provider-agnostic adapters. LLM in the diagram is concretely Anthropic today, but the runtime sees a ModelAdapter. Swapping providers is a one-package change.
Context assembly happens once per turn, not once per LLM call. The same six-layer bundle gets reused across multiple LLM iterations inside a single agent turn.

How delegation flows

A specific case worth its own diagram: when one agent delegates to a sub-agent. This is the order-triage scenario in miniature.

The triage agent receives the request:

User → triage: “Anna’s towel is defective. She wants a refund.”
triage runs LLM turn 1, calls shopify_get_order_by_email to look up Anna’s recent orders
triage runs LLM turn 2, decides to delegate, calls delegate_to_refund_decision

The runtime synthesizes the delegation tool from the sub-agent list. The tool’s handler invokes runTurn() recursively with the sub-agent’s definition.

The refund_decision sub-agent runs:

triage → refund_decision: runTurn(refund_decision_def, sub_task)
refund_decision gets a fresh 6-layer context (its own system prompt, its own characteristics, its own tool list)
refund_decision → long-term memory: recall_memory("refund history for anna@...") returns 2 matches: prior valid refund, repeat order
refund_decision runs LLM turn 1, reasons: clean record, valid reason → auto-approve
refund_decision → shopify-actions queue: emit_event(shopify_actions, refund 49 SEK)
refund_decision → triage: AgentReport { decision: "auto-approve", summary }

triage incorporates the report:

triage runs LLM turn 3, summarizes the outcome
triage → user: { status: "completed", summary: "..." }

A few observations:

refund_decision has its own context bundle. The parent’s context isn’t inherited; the sub-agent gets a fresh six-layer assembly with its own system prompt and characteristics. Only the delegated context (layer 4) carries data from the parent.
The sub-agent has its own tool list. refund_decision can call recall_memory (because its YAML has memory_config.long_term_enabled: true). triage can’t — not in its tool list.
Long-term memory is per-agent. The recall query is filtered structurally by agent_id = 'agent-refund-decision'; triage’s memories (if it had any) wouldn’t show up.
Events are emitted from the deepest agent that has the context to make the decision. refund_decision decides; it emits. triage could also have emitted an event, but in the current scenario it just summarizes.

Threat model

What the platform defends against, and what it doesn’t.

What’s defended

Prompt injection via tool output, stored memory, or delegated task. The six-layer model is specifically designed for this. An attacker who lands malicious text in a memory entry can influence layers 5–6 at most; layers 1–2 (the agent’s identity and hard constraints) are unreachable.

Sub-agent runaway recursion. Each agent has an autonomy.max_delegation_depth configured. The runtime tracks delegation depth across the call chain and refuses to go deeper.

Budget exceedance. Time and cost budgets are enforced at iteration boundaries, not just per LLM call. A turn that would blow through its budget gets stopped and surfaces a TurnBudgetExceededError with the partial state captured in the report.

Cross-agent memory leakage. Long-term memory is structurally scoped by tenant_id and agent_id via Vectorize metadata indexes. The agent itself cannot specify these — the runtime composes them from the context bundle. A buggy or malicious agent cannot retrieve another agent’s or another tenant’s memories.

Cross-tenant queue leakage. Each event payload includes a tenant_id derived from the originating context. Consumers filter on it.

Tool whitelist violations. An agent can only call tools listed in its YAML’s tools.allowed. Even if the LLM tries to call a tool that exists in the registry but isn’t whitelisted for this agent, the runtime throws AutonomyBoundaryError before the tool runs.

What’s NOT defended (by design or by deferral)

LLM jailbreaks within allowed behavior. An agent’s system prompt says “don’t auto-approve refunds over $500.” A clever adversarial input might convince the LLM to approve a $499.99 refund that it shouldn’t. The platform can’t prevent within-the-allowed-policy errors; the policy itself has to be robust. Approval gates (Phase 2) move the policy enforcement out of the LLM and into a deterministic check.

Shopify-side side effects without idempotency. Today’s shopify_actions queue consumer is logs-only, so this doesn’t matter yet. Phase 2 introduces real mutations and idempotency keys per event.id so a delivered-twice event doesn’t refund twice.

Resource exhaustion attacks (DoS). A flood of /run requests from a hostile client would cost real money in LLM calls before being rate-limited. Cloudflare’s edge rate limits sit in front; per-tenant rate limits in the platform are a Phase 4 concern.

Sub-agents that share an LLM provider have a shared blast radius. If Anthropic has an outage, every agent that uses Anthropic stops working. The ModelAdapter abstraction lets us fail over to a different provider, but we don’t do that automatically.

Memory poisoning by an authorized writer. A compromised agent (one whose YAML or environment is subverted) can write malicious entries into long-term memory that get retrieved by itself or by other agents in its tenant. The defense is at the supply-chain level — only trusted operators publish agent YAMLs, only trusted code runs in the Worker.

Boundaries we don’t cross

Operational principles, not architectural choices, but they shape the implementation:

The platform doesn’t store credit cards, full SSNs, or full passport numbers. Tools that interact with payment systems (Phase 2) will reference tokenized identifiers, not raw values. Worker secrets carry API keys; nothing else.

The platform doesn’t log secrets. The @agent-platform/logger package has a redaction pass that masks any field name matching api_key, authorization, password, token, secret. The redaction is opt-out — the default is “if it looks like a secret, redact it.”

The platform doesn’t trust unauthenticated input. The HTTP API requires a Bearer token on every endpoint except /health. Cron triggers come from Cloudflare directly (no HTTP, no header check needed; the source is structural).

The platform doesn’t write to systems it doesn’t own. Phase 1’s only write paths are to Cloudflare D1 + Vectorize (internal) and to log streams. Shopify mutations are queued, not executed (Phase 2). External systems are read-only.

What this looks like at scale

Phase 1’s measured numbers, projected forward:

Dimension	Today	At 10K agent runs/day	At 1M agent runs/day
LLM cost	~$0.05/run (Sonnet+Haiku)	~$500/day	~$50K/day
Embedding cost	~$0.0000004/recall	negligible	~$5/day
D1 read ops	~5/run	50K/day (free tier covers 5M)	5M/day (free tier covers; bursty)
Vectorize queries	~1/recall	10K/day	1M/day
Worker requests	1/run sync, 2/run async	well under free tier	$5-50 above free tier
D1 storage	~10 KB/agent definition + memory rows	KB/MB scale	GB scale
Vectorize storage	1536 dim × ~100 entries today	1.5M stored dimensions	1.5B stored dimensions ($75/mo)

Phase 1 doesn’t approach any of these limits. The shapes matter for Phase 4 multi-tenant; the early signal is that Cloudflare’s free tiers cover surprising scale, and the dominant cost at every horizon is LLM calls.

Where to next

If you’re still ramping:

Testing for what’s tested, what’s not, and the testing-fidelity gap we hit twice in Phase 1
Scenarios (commits 5–7) for end-to-end walkthroughs of order-triage, merchandising, and the B2B SaaS hypothetical that uses every feature

If you want decision-level depth:

ADR-0005 — the two-layer split
ADR-0006 — the priority model
ADR-0021 — the engine
ADR-0030 — long-term memory