ADR-0014: Cloudflare Workers as the runtime platform
ADR-0014: Cloudflare Workers as the runtime platform
Section titled “ADR-0014: Cloudflare Workers as the runtime platform”Status: Accepted Date: 2026-04-21
Context
Section titled “Context”The “runtime platform” question has been listed as open since the monorepo was scaffolded. Every subsequent decision — the logger choice, the secret-management story, the observability plumbing, the database client, the HTTP framework — depends on where agent code actually runs. With ADR-0013 now defining what “enterprise-ready” means for this project, deferring the platform decision further would mean writing enterprise-ready components against an abstract target, which is the worst of both worlds: abstract enough to be wrong, concrete enough to have to redo.
The options considered in the open question were:
- Cloudflare Workers + Durable Objects. Named as the default in the earliest planning sessions. Per-agent persistent state fits the Durable Object primitive naturally. Global low-latency dispatch, scale-to-zero billing, integrated storage (D1, KV, R2, Vectorize), integrated logging (Workers Logs), and tail workers for observability without extra infrastructure.
- Node.js long-running (Fly.io / Railway / ECS). Maximum ecosystem compatibility. Every npm package works. No platform-specific constraints on CPU time, memory, or code size. But: cold starts, per-instance billing, separate logging / metrics / secrets stack to choose and operate.
- Hybrid (Workers for fast paths, Node for long-running). Two runtimes to understand, two deployment pipelines, two sets of platform quirks. The implied benefits only materialize at a scale we do not yet have.
What changed the calculus
Section titled “What changed the calculus”- SQLite-backed Durable Objects are generally available. Durable Objects are now a production primitive for per-agent state, not a beta.
- The project’s step-by-step development preference. A platform that requires operating more infrastructure (Node + orchestrator + log aggregation + metrics backend + secrets manager) means more that can break during each step. Workers collapses most of that into the platform.
- First vertical is e-commerce, consumed via HTTPS. The workload is request/response with occasional background tasks (via Queues or Durable Object alarms), not long-running batch jobs. This is the Workers sweet spot.
- Cost model at the current stage. Scale-to-zero matters when the platform is pre-revenue. A $5/month Workers Paid plan covers everything we need today; a Node deployment that is “always on” starts at ~$20/month for a single small instance and adds separately-billed services for logging, metrics, and secrets.
What has not changed
Section titled “What has not changed”- The code is platform-agnostic where it can be. TypeScript targets ES2022 (ADR-0002); the core and schemas packages have no runtime dependencies beyond Zod; the runtime package uses only
structuredCloneandObject.freeze, both native on both platforms. Moving off Workers later would be a deployment change, not a rewrite. - The storage choice is still open (open-questions.md#storage-primitives). Committing to Workers narrows it (D1 and KV become defaults rather than options) but does not foreclose it — Hyperdrive would allow Postgres from Workers if we later decided to.
Decision
Section titled “Decision”The Agent Platform runs on Cloudflare Workers. Specifically:
- Agent runtime code is deployed as Workers modules (ES modules format).
- Per-agent persistent state lives in Durable Objects (SQLite-backed, given its GA status and superior introspection story compared to the KV storage backend).
- Storage primitives default to the Cloudflare stack (D1 for relational, KV for cache / working memory, Vectorize for long-term memory embeddings, R2 for files). Each specific choice still needs its own ADR when the component that uses it ships; this ADR only sets the default from which to argue.
- The
nodejs_compatcompatibility flag is enabled inwrangler.tomlfor any Worker that benefits from Node built-ins. We do not lean on it for core abstractions. - The
compatibility_datefor every Worker is pinned to a specific date and bumped deliberately, not floated. - Local development uses Wrangler’s built-in dev server; tests that specifically exercise Worker behavior use
@cloudflare/vitest-pool-workers(deferred — see Consequences below).
Consequences
Section titled “Consequences”- Every subsequent platform-shaped ADR has a concrete target. The logger ADR can specify that it works within Workers’
console/Workers Logs pipeline rather than abstracting over three backends. The secret-management ADR can specify Worker secrets (wrangler secret put) rather than “some secrets manager.” The HTTP-framework ADR (currently open) narrows to Hono or rawfetchhandler — Fastify and Elysia are off the table for Workers. - The
packages/coreandpackages/schemaspromises are preserved. Neither package depends on anything Workers-specific and neither will. The Workers-specific code lives inapps/*and inpackages/runtimeonly to the extent the runtime needs to call platform APIs (bindings, Workers Logs, Durable Object stubs). Business Packs stay platform-agnostic to the maximum extent possible — vertical logic is not Workers logic. - We take on platform-specific constraints. CPU-time budget per request (currently 30s on Workers Paid, more on Standard Unbound but on a different price model), 128 MiB memory per isolate, code-size caps,
evaldisallowed. Every component ADR that could bump into one of these limits states so explicitly. - Workers Logs is the default log sink. Bar 7 from ADR-0013 (“structured logs”) is satisfied by emitting
console.log(JSON.stringify(...))into the Workers Logs pipeline. The logger wrapper (future ADR) abstracts the call so tests do not need Workers to run. - Observability is mostly solved out of the box. Request logs, CPU-time metrics, invocation counts, and exception captures are provided by the platform. Custom metrics go via Workers Analytics Engine. This means bars 5 and 6 from ADR-0013 (LLM-call trace, audit record) are additions on top of an already-capable substrate rather than an observability stack to build from scratch.
- Testing inside the Workers runtime is deferred, with a trigger.
@cloudflare/vitest-pool-workersrequires Vitest 4.1+; we currently pin Vitest 3.2.4 (ADR-0004). Today’s tests (62 in the workspace) exercise platform-agnostic logic and run in Vitest’s Node environment — that is correct for what they test. The trigger to revisit: the first component whose behavior depends on a Workers-specific API (a binding, Durable Object lifecycle, Queues, KV) is added, at which point ADR-0004 is superseded with an ADR bumping Vitest and adding@cloudflare/vitest-pool-workers. Platform-agnostic packages (core,schemas, currentruntime) keep running in the Node pool regardless; only Worker-specific packages run in the Worker pool. - Deployment is Wrangler-based.
wrangler deployper Worker. CI/CD (open-questions.md#cicd) is still an open question in the “what deploys and when” sense, but the deploy tool is no longer variable. - Business Packs remain the escape valve. If a future vertical’s workload is fundamentally not Workers-shaped (long-running Python ML inference, say), that pack can be a separate service the platform calls over the network. The core platform stays on Workers; verticals that do not fit become integrations, not rewrites.
Consequences for the repo
Section titled “Consequences for the repo”apps/*will contain Workers, each with its ownwrangler.tomlandcompatibility_date.packages/runtimewill grow a thin Workers-platform shim only when needed (e.g. reading from a Durable Object binding). The shim is behind an interface so non-Worker test environments can substitute it.tsconfig.base.jsonalready targets ES2022 withDOMandDOM.Iterablelibs (ADR-0002); this is compatible with Workers’ V8 isolate. No change needed.pnpm-workspace.yamldoes not change.
Alternatives considered
Section titled “Alternatives considered”- Node.js on Fly.io or Railway. The most portable option. Rejected because portability is not the problem we need to solve today — every requirement we have is met by Workers, and the operational complexity of running Node (logging, metrics, secrets, HA, cold-start mitigation, process supervision) is work we would have to do ourselves. The enterprise-readiness bar from ADR-0013 is easier to meet on a platform that provides audit-grade logging and secret management out of the box than one where we assemble it from parts.
- Hybrid Workers + Node. Considered for the case where a future workload genuinely doesn’t fit Workers. Rejected for Phase 1 because we do not have such a workload. If one appears, it becomes a specific, scoped integration in a Business Pack — not a cross-cutting architectural choice.
- AWS Lambda. Comparable serverless semantics to Workers but with a substantially heavier platform (VPC, IAM, CloudWatch, Secrets Manager, API Gateway for HTTP). The “integrated platform” argument that favors Workers favors it over Lambda even more sharply.
- Bun runtime on a VPS. Fast, Node-compatible, good DX. Rejected for the same reason Node is rejected: we would still be operating the infrastructure ourselves, plus Bun is a newer runtime with a smaller production track record in the contexts that matter (edge deployment, managed observability).
- Stay on “no decision” and make platform-agnostic abstractions. The status quo. Rejected because ADR-0013 requires every subsequent component to meet a concrete enterprise bar, and you cannot meet a concrete bar with abstract plumbing. “Works on any platform” means “tested on none.”