Skip to content

Packages

The platform is a pnpm monorepo with 16 packages under packages/. This is the engineering tour: what each one does, what depends on it, and the design decisions that shaped it.

Packages divide into three layers, mirroring the platform’s architecture:

  • Foundations — types, errors, schemas, logging, config. No domain logic. Every other package depends on at least some of these.
  • Adapters — provider-agnostic interfaces plus their concrete implementations. LLM adapters, embedding adapters.
  • Runtime — the agent execution layer: context assembler, tool registry, memory subsystem, event bus, agent runtime, YAML loader, built-in tools.
  • Business pack — vertical-specific. Today only Shopify.

Each package’s section below covers four things: what it does, who depends on it, the most interesting design decision, and where to look for the entry point.

The platform’s type vocabulary. Defines the shapes that flow through every other package: AgentDefinition, Task, AgentReport, MemoryEntry, LongTermMemory (the gateway interface), Tool, ToolResolver, the six Context layers, and the branded ID types (AgentId, JobId, EventId, ToolId).

Everything else in the platform either implements one of these interfaces or accepts one as input. There’s no business logic here — core is the lingua franca that lets packages compose without circular dependencies.

Depends on: errors only. Depended on by: every other package. Design choice: types are interface and type only — no classes exposed at the package boundary. Concrete classes live in their own packages and are returned by factory functions. This keeps core zero-runtime-cost; importing it doesn’t pull in any implementation. Entry point: packages/core/src/index.ts.

The error taxonomy. AgentPlatformError is the base class; concrete subclasses cover the categories the platform actually distinguishes between: AgentDefinitionError, BudgetExceeded, ContextOverrideAttempt, ToolExecutionError, LLMError, StorageError, and a handful more.

Every error thrown by the platform is one of these. Catch handlers can instanceof AgentPlatformError to filter platform errors from unexpected ones; further instanceof discrimination separates budget overruns from validation failures from network blips.

Depends on: nothing. Depended on by: every other package. Design choice: errors carry structured context in a details field, not just a message. When the runtime catches a ToolExecutionError, it logs the tool name, the agent, the input that failed validation, and the underlying cause separately. This is what makes production debugging tractable. Entry point: packages/errors/src/index.ts.

Zod schemas for the trust boundary. Every value that crosses from “untrusted input” to “platform internal” passes through a schema in this package: HTTP request bodies, queue messages, agent YAML files (after parsing), tool inputs.

The schemas are the single source of truth for runtime validation; corresponding TypeScript types are derived via z.infer<typeof Schema>. We never write a TS type that mirrors a Zod schema by hand — they go out of sync.

Depends on: core for branded IDs and shared types. Depended on by: every package that handles untrusted input. Design choice: schemas are exported individually (no barrel exports of “all schemas”) so consumers pull only what they need. Tree-shaking matters in a Workers bundle. Entry point: packages/schemas/src/index.ts.

Structured logging. Defines the Logger interface (info, warn, error, debug) plus two concrete implementations: JsonLogger (production; writes one JSON object per line to console.log) and MemoryLogger (tests; collects entries in an in-memory array for assertion).

Loggers carry bindings — a key-value context that gets merged into every log line. The Worker creates a logger with { trace_id, request_id } at the top of a request and passes it down; every nested operation adds its own keys.

Depends on: nothing. Depended on by: every package that does I/O. Design choice: redaction is built in. A list of sensitive key names (api_key, authorization, password, etc.) gets replaced with [REDACTED] before serialization. This catches the common case where a debug log accidentally captures a header with a Bearer token. Entry point: packages/logger/src/index.ts.

Typed access to environment variables and Worker secrets. Wraps the chaotic process.env / env.MY_SECRET patterns in a single function: readSecret(env, 'ANTHROPIC_API_KEY') returns a typed string, throws ConfigError if missing.

Doesn’t try to be a configuration framework. Doesn’t load files, doesn’t merge sources, doesn’t watch for changes. Just: “give me this secret, fail loudly if it’s not there.”

Depends on: errors. Depended on by: apps/worker, apps/example, anywhere that reads secrets. Design choice: secret reading is a function call at the use site, not a global config object. Globals get stale on Workers between deploys; explicit reads at the use site stay correct. Entry point: packages/config/src/index.ts.

The provider-agnostic LLM interface. Defines ModelAdapter (the interface every concrete provider implements), the request/response shapes (ModelRequest, ModelResponse, ToolCall, StopReason), and the error taxonomy (LLMAuthError, LLMRateLimitError, LLMOverloadedError, LLMUnavailableError, LLMInvalidRequestError).

Also ships MockAdapter — a scriptable fake used in tests. Tests don’t call real LLMs; they instantiate MockAdapter with a sequence of canned responses.

Depends on: core, errors. Depended on by: runtime, llm-anthropic, every package that exercises agent loops in tests. Design choice: the adapter interface is intentionally minimal: one complete() call returns a single ModelResponse with possible tool calls. Streaming is not in the interface — it adds complexity that the runtime doesn’t currently exploit and that not every provider supports the same way. Entry point: packages/llm/src/index.ts.

The concrete Anthropic implementation of ModelAdapter. Wraps @anthropic-ai/sdk, translates Anthropic’s error responses into the platform’s LLMError taxonomy, and enforces budget caps (max tokens, max cost) before each call.

Pricing tables for each model are hard-coded here so cost estimation can run without a network call.

Depends on: llm, errors, logger, @anthropic-ai/sdk. Depended on by: apps/worker, apps/example. Design choice: the SDK’s tool-use response format gets transformed eagerly into the platform’s flat ToolCall shape. The runtime never sees Anthropic’s wire format. This is what makes swapping providers a one-package change. Entry point: packages/llm-anthropic/src/index.ts.

The provider-agnostic embedding interface, a sibling to llm. Defines EmbeddingAdapter (the contract: embed(strings) → { vectors }), the request/response shapes, and a MockEmbeddingAdapter for tests.

Smaller surface than llm — embeddings are a one-function API.

Depends on: errors. Depended on by: embeddings-openai, memory, anywhere that exercises recall in tests. Design choice: the adapter exposes the model’s dimension count as a config field. Consumers (notably the long-term memory gateway) read this to verify their Vectorize index dimension matches the embedder’s output. Mismatches fail at adapter construction, not at first query. Entry point: packages/embeddings/src/index.ts.

The OpenAI concrete implementation of EmbeddingAdapter. Uses raw fetch (no SDK dependency); translates OpenAI errors into the platform’s LLMError taxonomy.

The single most cautionary file in the codebase: this is where the Workers fetch this-binding bug shipped to production (commit 11 hotfix). The fix and the comment explaining it live at the top of the constructor.

Depends on: embeddings, llm (for the shared error taxonomy), errors. Depended on by: apps/worker (when long-term memory is enabled). Design choice: raw fetch instead of the OpenAI SDK. The SDK has Node-specific assumptions and a large dep graph that doesn’t fit Workers cleanly. Trade-off: we re-implement small bits (request shape, error parsing) instead of getting them free. Worth it for the bundle size and runtime simplicity. Entry point: packages/embeddings-openai/src/index.ts.

The ToolRegistry — the in-memory lookup table that maps a tool name to its handler. Built once per request; immutable thereafter.

The runtime asks the registry “is this tool name allowed for this agent?” and “give me the handler for this tool name.” Both are O(1) Map lookups.

Depends on: core, errors. Depended on by: runtime, apps/worker. Design choice: registries are immutable after construction. You can’t add a tool mid-turn. This prevents a class of bugs where dynamic registration races with concurrent invocations. The smallest package in the monorepo (~120 lines). Entry point: packages/tools/src/index.ts.

Async coordination over Cloudflare Queues. Defines the EventBus interface (publish(topic, payload)), wraps the Workers Queue binding, and validates payloads against per-topic Zod schemas.

Two concrete impls: CloudflareEventBus (production) and MockEventBus (tests). The mock collects published events in an array; tests assert on the array.

Depends on: core, errors, schemas. Depended on by: runtime (via the emit_event built-in tool), apps/worker. Design choice: topic schemas are registered at construction. The bus refuses to publish to an unknown topic — there’s no way to silently drop a typo’d topic name. ADR-0032 covers the full async-coordination decision. Entry point: packages/event-bus/src/index.ts.

The memory subsystem. Implements both layers of the platform’s memory:

  • Working memory — the sliding-window message buffer for the current agent turn. WorkingMemoryGateway interface + a Durable-Object-backed implementation.
  • Long-term memory — persistent vector search per agent. LongTermMemory interface + a VectorizeBacked implementation that combines Vectorize for similarity search and D1 for content storage.

Each gateway is constructed with the dependencies it needs (a Vectorize binding, a D1 binding, an embedder) and exposes a narrow interface (store, search, delete).

Depends on: core, embeddings, errors, logger. Depended on by: apps/worker, runtime (for working memory). Design choice: D1 is the source of truth; Vectorize holds only embeddings + minimal metadata for filtering. If Vectorize is wiped, we can re-index from D1. The reverse isn’t true. This asymmetry shapes ADR-0030. Entry point: packages/memory/src/index.ts.

Reads YAML agent definition files from disk and validates them into typed AgentDefinition instances. Handles the prompt: field that points to a sibling Markdown file (so prompts live in their own files for readability).

Depends on: core, errors, schemas, yaml. Depended on by: apps/worker (which calls the loader at build time to bundle the agents into the Worker’s source). Design choice: YAML parsing happens at build time, not at request time. ADR-0033 covers the bundling decision: the Worker imports a generated agents.ts module that has every agent definition statically. No filesystem reads in the hot path. Entry point: packages/agent-loader/src/index.ts.

The agent execution engine. Three responsibilities:

  • Context assembly — given a Task and an AgentDefinition, build the six-layer context that the agent’s LLM turn will see. Enforces the validateNoOverride() invariant: no lower layer can overwrite a higher one.
  • Agent turn — run the LLM call, parse tool calls, dispatch to the tool registry, recurse on delegation tools, return a structured AgentReport.
  • Budget enforcement — token caps, time caps, delegation- depth caps. Trips and surfaces a typed error before runaway cost or recursion.

This is the most architecturally important package and the largest (~1400 lines of source, ~80+ tests).

Depends on: every foundation + adapter package. Depended on by: apps/worker, apps/example. Design choice: delegation is implemented as a tool call. When agent A lists agent B as a sub-agent, the runtime synthesizes a delegate_to_B tool whose input is the task spec for B. The tool’s handler invokes the runtime recursively. This unification is what keeps the abstraction small — there’s no separate delegation API, just tools. Entry point: packages/runtime/src/index.ts.

The platform’s three built-in tools, every agent can opt into them:

  • recall_memory — semantic search over the agent’s long-term memory. Calls the LongTermMemory gateway with the agent’s ID for scoping.
  • store_memory — write a new entry to long-term memory.
  • emit_event — publish an event onto the bus for downstream processing.

Each tool is a function with a Zod input schema. The runtime auto-registers built-ins based on flags in the agent definition (memory_config.long_term_enabled enables recall_memory + store_memory; events.enabled enables emit_event).

Depends on: core, errors, event-bus, memory, schemas. Depended on by: apps/worker, runtime. Design choice: built-ins live in their own package, not in runtime. This keeps the runtime independent of memory and event bus implementations — the runtime doesn’t import either; it just sees a ToolResolver. Built-ins are wired in at the application layer (the worker). Entry point: packages/builtin-tools/src/index.ts.

The Shopify Admin API client. Currently single-tenant (custom- app token, not OAuth) and read-only — products, recent orders, shop info. Used by the merchandising agent (cron) and by the shopify_get_order_by_email tool (order-triage scenario).

OAuth + write paths ship in Phase 2 when the human-approval gate is designed. Today the platform reads from Shopify; it doesn’t mutate.

Depends on: errors, logger. Depended on by: apps/worker. Design choice: GraphQL queries are typed via co-located .graphql strings + custom request types — no full schema-codegen pipeline. We ran into one production bug at deploy time (selectionMismatch on a missing field) that better tooling would have caught; the trade-off is documented as cautionary tale in the package README. Entry point: packages/shopify/src/index.ts.

If you’re trying to learn the codebase, read the packages in this order:

  1. core — the type vocabulary. 30 minutes; sets your compass for everything else.
  2. errors + schemas — the trust-boundary primitives.
  3. llm — the agent’s LLM-shaped interface. Read MockAdapter first; production logic in llm-anthropic second.
  4. runtime — the heart of the platform. The context assembler is where the six-layer model lives.
  5. memory + event-bus — the IO abstractions.
  6. apps/worker — the wiring layer. See Apps for that tour.

Skip tools, config, logger until you need them — they’re small enough to absorb in passing.