ADR-0024: Deployment model — Cloudflare Workers as first target, runtime-agnostic core

Status: Accepted Date: 2026-04-23

Context

With the tool registry and first runnable CLI app shipped (ADR-0023), the platform has everything it needs to execute an agent — but nothing deployable. apps/example is a Node CLI; it demonstrates the shape but cannot accept HTTP requests, run on a schedule, or handle webhook-triggered work.

Making the platform deployable forces decisions that have been pending:

What runs the Worker. ADR-0014 named Cloudflare Workers aspirationally but no code ran on it. This ADR commits.
Sync vs async execution model. LLM calls take 1-3s each; multi-step agent turns take 10-30s. The 30-second HTTP request shape that web apps default to doesn’t work for realistic agent workloads. We need an async path.
Where state lives between request boundaries. Long-running jobs need durable storage. DOs? D1? KV? Different shapes have very different costs and access patterns.
How much of the platform becomes Cloudflare-specific. Every Worker-ism that leaks into packages/* makes future platform migration expensive.

Separately, Ganimarka becomes the concrete first customer in this session — which raises the stakes. Abstract architecture is now real store automation. And a potential future customer (vetzoo.se, 150K products, actual traffic) raises the scale question: does Cloudflare limits hurt us at scale?

Decision

Cloudflare Workers is the first deployment target. The platform core remains runtime-agnostic.

Three sub-decisions land together:

(1) Runtime-agnostic platform core

Every package under packages/* uses only:

fetch(), Request, Response — standard Web APIs available in Workers, Node 18+, Deno, Bun
AbortController, AbortSignal — same
Date, setTimeout — same
No process.env, no fs, no Node-specific imports, no Cloudflare-specific imports

Platform-specific code lives exclusively in apps/*. Today that means apps/worker (Cloudflare) and apps/example (Node CLI). Both run the same core packages unchanged.

Why this matters: the commitment to Workers is now a deployment choice, not an architecture choice. If we migrate — to Node/AWS, Deno Deploy, Vercel Functions, Fly.io containers — we rewrite one app folder and one ADR. Everything else comes with us.

(2) Two execution modes: sync and async

Sync mode (POST /run): appropriate for short turns. Request waits for the agent to finish and receives the AgentReport. Bound by Cloudflare’s wall-time limits, which are generous for I/O-bound work (LLM calls are I/O, not CPU — the 30s CPU limit effectively doesn’t apply).

Async mode (POST /jobs + GET /jobs/:id): for anything longer than the sync window, or anything triggered non-interactively (cron, webhook, bulk work). Submission returns immediately with a job_id; clients poll or receive callbacks when the job completes.

Both modes use the same agent runtime (createAgentRuntime). The difference is where the turn executes — inline in the HTTP request, or inside a Durable Object’s alarm handler.

(3) One Durable Object per async job

AgentJob is a DO class. Each async submission creates one DO instance (via idFromName(job_id)). The DO receives the JobRecord, schedules an alarm for “now,” returns 202 to the entry worker. The alarm fires, the DO runs the agent turn via executeJob, persists the final record.

Why one-per-job:

No coordination cost. Each job runs in its own DO; no locks, no routing, no head-of-line blocking.
Stable lookup. idFromName(job_id) gives any worker the same DO — status polls hit the same instance regardless of which edge location handles the request.
Simple lifecycle. DO exists → job is queued/running/done. Cleanup sweeps are orthogonal.
Matches the job-submission mental model developers already have from Sidekiq, Celery, AWS Batch.

Rejected alternatives:

One DO per user or per tenant. Good for conversation continuity — a single user’s multi-turn chat lives in one DO. Premature today; we don’t have multi-turn chat and we don’t have users yet.
One DO coordinator for all jobs. Single DO receives all submissions, tracks them in its storage. Serializes work that doesn’t need to be serial; hot-spots a single instance. No upside we need.
KV for job state, no DO. Works for fire-and-forget but DOs are needed anyway to escape the 30s HTTP limit. DOs + alarm + storage.put is strictly simpler than DO + KV + alarm.

Accepted drawback: orphaned completed DOs accumulate. Each is tiny (one record) and Cloudflare charges minimally for idle DOs, but at scale this needs a sweeper. Not blocking today; explicitly tracked as future work.

The question of Cloudflare’s limits at scale

A fair concern was raised: vetzoo.se has 150K products and real traffic. Will Workers hurt us there?

Honest assessment of the real Cloudflare limits:

30s CPU limit — mostly a non-issue. Agent work is I/O-bound (LLM calls); the CPU time is sub-second even for complex turns. Only becomes a problem for actual in-Worker computation (CSV parsing, image processing). Mitigation: DO alarm handler runs for minutes, no CPU cap in the same sense.
Subrequest limit — 50 on free tier, 1000 on paid. An agent with many tool calls approaches this faster than expected. Mitigation: explicit limit in the runtime (MAX_ITERATIONS — currently 10), and a “chunk long work across multiple jobs” pattern for things like product-rewrite loops.
128 MB memory per invocation — fine for agent work, tight for batch data processing. Mitigation: stream results, don’t accumulate.
DO pricing at scale — potentially surprising. At Ganimarka volume (zero sales today), free. At vetzoo volume, needs measurement before commitment. Mitigation: we measure.
No stable region / IP — enterprise compliance concern. Cloudflare has data-residency features (Jurisdictional Restrictions) but they’re paid plans. Mitigation: if a customer’s compliance needs this, we’d move that customer’s workload to a traditional cloud target. See the migration story below.

None of these are Ganimarka-blocking. All become relevant at vetzoo-class scale or with enterprise compliance needs.

Revisit triggers — explicit conditions under which we reconsider Workers:

Subrequest limits routinely bite a production workload even after chunking.
A customer demands data-residency guarantees Cloudflare doesn’t offer in our price tier.
DO costs at measurable volume (say, 10K jobs/day) exceed comparable AWS Lambda+DynamoDB costs by >2×.
Debugging production incidents takes meaningfully longer on Workers than a typical Node-on-containers setup would.
A library we genuinely need is Node-only with no Workers port on the horizon (Puppeteer, native bindings, etc.).

If any of these fire, we port apps/worker to a different target (Node on Fly.io, Bun on Railway, AWS Lambda, etc.). The cost of that port is bounded — one app folder, one ADR, some ops work. The platform core is already portable by construction (sub-decision 1).

Consequences

Platform is deployable. wrangler deploy publishes. Sync and async requests work. Real HTTP, real DOs, real observability.
Ganimarka automation can ship. Session 11’s work (first real Ganimarka agent) now has somewhere to run.
Runtime-agnostic property committed. Future migration, if needed, costs one app folder. Not zero, but bounded and predictable.
No long-term commitment to Cloudflare. The ADR names specific revisit triggers; if one fires, we port.
Deferred work is explicit, not implicit. Cron, webhooks, queues, WebSocket streaming, DO cleanup — each tracked with a driving forcing function.

Consequences for the repo

New app: apps/worker (~400 lines including README, wrangler.toml, tests).
Handlers are pure functions (src/handlers.ts); the Hono wiring in src/index.ts is thin.
Tests: 18 new covering request parsing, sync run, job records, error serialization, and executeJob against a MockAdapter. Workspace total: 394 passed + 2 skipped.
@cloudflare/workers-types and hono added as deps. wrangler as a devDep.
No changes to packages/* — runtime-agnostic invariant preserved in practice, not just in design.

Alternatives considered

Ship sync-only, defer async. Shorter session. Rejected: Ganimarka’s real automations (bulk rewrites, scheduled campaign planning, webhook triggers) are all async by nature; shipping sync-only would mean session 11 immediately hits the wall.
Ship async via Queues instead of DOs. Cloudflare Queues are a fan-out primitive; a DO alarm is a per-job scheduler. For “one job, one agent run, one result record,” DO-per-job is simpler. Queues are the right fit for later fan-out work (process 100 products concurrently). Not mutually exclusive; just sequenced.
Ship a Node/Express backend instead. Full Node ecosystem, no subrequest limits, any library works. Costs $50/month minimum to keep up vs Cloudflare’s free tier. More ops work. Rejected as premature — we can always move if reasons emerge, and the core is portable.
Hybrid: edge Workers for sync, Node backend for async. Two platforms to operate. Rejected — one platform end-to-end is simpler; if we need Node later, we can migrate async first and keep sync on Workers, or go all-Node.
Use Durable Objects for per-user state now (long-lived per-tenant DOs). Nice property for multi-turn conversation continuity. Premature — we have no users and no multi-turn conversations yet. When memory ships, we revisit.
Don’t commit the runtime-agnostic property. Would let us use Workers-specific APIs directly in packages (e.g. access DO bindings). Small code-writing wins; makes migration genuinely painful. Rejected decisively.

What’s Next

Session 11 picks up with the first real Ganimarka automation using this deployment target. Likely candidates:

Product description rewriter. Reads Shopify products, generates better Swedish + English descriptions, saves drafts. Bulk background job via /jobs. This forces the Shopify tool to exist and exercises the async path at realistic volume.
Weekly campaign planner. Cron-triggered agent that proposes banner + landing-page angle for the coming week. Forces cron trigger support (small addition to wrangler.toml).
Abandoned cart follow-up. Webhook-triggered agent that drafts a personalized message. Forces webhook handling shape.

My recommendation for session 11 is (1). Ganimarka has weak supplier copy today; fixing 40 product descriptions would be a concrete visible improvement and would exercise the platform under realistic multi-agent load.