Skip to content

ADR-0014: Cloudflare Workers as the runtime platform

ADR-0014: Cloudflare Workers as the runtime platform

Section titled “ADR-0014: Cloudflare Workers as the runtime platform”

Status: Accepted Date: 2026-04-21

The “runtime platform” question has been listed as open since the monorepo was scaffolded. Every subsequent decision — the logger choice, the secret-management story, the observability plumbing, the database client, the HTTP framework — depends on where agent code actually runs. With ADR-0013 now defining what “enterprise-ready” means for this project, deferring the platform decision further would mean writing enterprise-ready components against an abstract target, which is the worst of both worlds: abstract enough to be wrong, concrete enough to have to redo.

The options considered in the open question were:

  1. Cloudflare Workers + Durable Objects. Named as the default in the earliest planning sessions. Per-agent persistent state fits the Durable Object primitive naturally. Global low-latency dispatch, scale-to-zero billing, integrated storage (D1, KV, R2, Vectorize), integrated logging (Workers Logs), and tail workers for observability without extra infrastructure.
  2. Node.js long-running (Fly.io / Railway / ECS). Maximum ecosystem compatibility. Every npm package works. No platform-specific constraints on CPU time, memory, or code size. But: cold starts, per-instance billing, separate logging / metrics / secrets stack to choose and operate.
  3. Hybrid (Workers for fast paths, Node for long-running). Two runtimes to understand, two deployment pipelines, two sets of platform quirks. The implied benefits only materialize at a scale we do not yet have.
  • SQLite-backed Durable Objects are generally available. Durable Objects are now a production primitive for per-agent state, not a beta.
  • The project’s step-by-step development preference. A platform that requires operating more infrastructure (Node + orchestrator + log aggregation + metrics backend + secrets manager) means more that can break during each step. Workers collapses most of that into the platform.
  • First vertical is e-commerce, consumed via HTTPS. The workload is request/response with occasional background tasks (via Queues or Durable Object alarms), not long-running batch jobs. This is the Workers sweet spot.
  • Cost model at the current stage. Scale-to-zero matters when the platform is pre-revenue. A $5/month Workers Paid plan covers everything we need today; a Node deployment that is “always on” starts at ~$20/month for a single small instance and adds separately-billed services for logging, metrics, and secrets.
  • The code is platform-agnostic where it can be. TypeScript targets ES2022 (ADR-0002); the core and schemas packages have no runtime dependencies beyond Zod; the runtime package uses only structuredClone and Object.freeze, both native on both platforms. Moving off Workers later would be a deployment change, not a rewrite.
  • The storage choice is still open (open-questions.md#storage-primitives). Committing to Workers narrows it (D1 and KV become defaults rather than options) but does not foreclose it — Hyperdrive would allow Postgres from Workers if we later decided to.

The Agent Platform runs on Cloudflare Workers. Specifically:

  • Agent runtime code is deployed as Workers modules (ES modules format).
  • Per-agent persistent state lives in Durable Objects (SQLite-backed, given its GA status and superior introspection story compared to the KV storage backend).
  • Storage primitives default to the Cloudflare stack (D1 for relational, KV for cache / working memory, Vectorize for long-term memory embeddings, R2 for files). Each specific choice still needs its own ADR when the component that uses it ships; this ADR only sets the default from which to argue.
  • The nodejs_compat compatibility flag is enabled in wrangler.toml for any Worker that benefits from Node built-ins. We do not lean on it for core abstractions.
  • The compatibility_date for every Worker is pinned to a specific date and bumped deliberately, not floated.
  • Local development uses Wrangler’s built-in dev server; tests that specifically exercise Worker behavior use @cloudflare/vitest-pool-workers (deferred — see Consequences below).
  • Every subsequent platform-shaped ADR has a concrete target. The logger ADR can specify that it works within Workers’ console/Workers Logs pipeline rather than abstracting over three backends. The secret-management ADR can specify Worker secrets (wrangler secret put) rather than “some secrets manager.” The HTTP-framework ADR (currently open) narrows to Hono or raw fetch handler — Fastify and Elysia are off the table for Workers.
  • The packages/core and packages/schemas promises are preserved. Neither package depends on anything Workers-specific and neither will. The Workers-specific code lives in apps/* and in packages/runtime only to the extent the runtime needs to call platform APIs (bindings, Workers Logs, Durable Object stubs). Business Packs stay platform-agnostic to the maximum extent possible — vertical logic is not Workers logic.
  • We take on platform-specific constraints. CPU-time budget per request (currently 30s on Workers Paid, more on Standard Unbound but on a different price model), 128 MiB memory per isolate, code-size caps, eval disallowed. Every component ADR that could bump into one of these limits states so explicitly.
  • Workers Logs is the default log sink. Bar 7 from ADR-0013 (“structured logs”) is satisfied by emitting console.log(JSON.stringify(...)) into the Workers Logs pipeline. The logger wrapper (future ADR) abstracts the call so tests do not need Workers to run.
  • Observability is mostly solved out of the box. Request logs, CPU-time metrics, invocation counts, and exception captures are provided by the platform. Custom metrics go via Workers Analytics Engine. This means bars 5 and 6 from ADR-0013 (LLM-call trace, audit record) are additions on top of an already-capable substrate rather than an observability stack to build from scratch.
  • Testing inside the Workers runtime is deferred, with a trigger. @cloudflare/vitest-pool-workers requires Vitest 4.1+; we currently pin Vitest 3.2.4 (ADR-0004). Today’s tests (62 in the workspace) exercise platform-agnostic logic and run in Vitest’s Node environment — that is correct for what they test. The trigger to revisit: the first component whose behavior depends on a Workers-specific API (a binding, Durable Object lifecycle, Queues, KV) is added, at which point ADR-0004 is superseded with an ADR bumping Vitest and adding @cloudflare/vitest-pool-workers. Platform-agnostic packages (core, schemas, current runtime) keep running in the Node pool regardless; only Worker-specific packages run in the Worker pool.
  • Deployment is Wrangler-based. wrangler deploy per Worker. CI/CD (open-questions.md#cicd) is still an open question in the “what deploys and when” sense, but the deploy tool is no longer variable.
  • Business Packs remain the escape valve. If a future vertical’s workload is fundamentally not Workers-shaped (long-running Python ML inference, say), that pack can be a separate service the platform calls over the network. The core platform stays on Workers; verticals that do not fit become integrations, not rewrites.
  • apps/* will contain Workers, each with its own wrangler.toml and compatibility_date.
  • packages/runtime will grow a thin Workers-platform shim only when needed (e.g. reading from a Durable Object binding). The shim is behind an interface so non-Worker test environments can substitute it.
  • tsconfig.base.json already targets ES2022 with DOM and DOM.Iterable libs (ADR-0002); this is compatible with Workers’ V8 isolate. No change needed.
  • pnpm-workspace.yaml does not change.
  • Node.js on Fly.io or Railway. The most portable option. Rejected because portability is not the problem we need to solve today — every requirement we have is met by Workers, and the operational complexity of running Node (logging, metrics, secrets, HA, cold-start mitigation, process supervision) is work we would have to do ourselves. The enterprise-readiness bar from ADR-0013 is easier to meet on a platform that provides audit-grade logging and secret management out of the box than one where we assemble it from parts.
  • Hybrid Workers + Node. Considered for the case where a future workload genuinely doesn’t fit Workers. Rejected for Phase 1 because we do not have such a workload. If one appears, it becomes a specific, scoped integration in a Business Pack — not a cross-cutting architectural choice.
  • AWS Lambda. Comparable serverless semantics to Workers but with a substantially heavier platform (VPC, IAM, CloudWatch, Secrets Manager, API Gateway for HTTP). The “integrated platform” argument that favors Workers favors it over Lambda even more sharply.
  • Bun runtime on a VPS. Fast, Node-compatible, good DX. Rejected for the same reason Node is rejected: we would still be operating the infrastructure ourselves, plus Bun is a newer runtime with a smaller production track record in the contexts that matter (edge deployment, managed observability).
  • Stay on “no decision” and make platform-agnostic abstractions. The status quo. Rejected because ADR-0013 requires every subsequent component to meet a concrete enterprise bar, and you cannot meet a concrete bar with abstract plumbing. “Works on any platform” means “tested on none.”