Cloudflare Workers
The platform’s main runtime. Every HTTP request, every cron tick, every queue message that the platform handles runs inside a Cloudflare Worker.
What we use it for
Section titled “What we use it for”Three handler types in one Worker (apps/worker):
fetch— the HTTP API. Routes for/health,/run(synchronous agent turns),/jobs(async submission + polling),/admin/seed-memory. Bearer-token auth on everything except/health.scheduled— cron triggers. Today the only cron is the weekly merchandising agent at Mondays 06:00 UTC. Cron-driven runs bypass auth (Cloudflare invokes the handler directly).queue— consumer for the async event bus. Reads from thehuman-reviewandshopify-actionsqueues; today logs-only, Phase 2 will wire real mutations.
A single Worker handles all three. One deploy, three entry points.
Why we picked it
Section titled “Why we picked it”The decision was about where the agent runtime lives. The real options were:
| Option | Verdict |
|---|---|
| Cloudflare Workers | Chosen. Edge cold start <10ms; native bindings to D1, Vectorize, Queues, Durable Objects, KV, R2. No infrastructure to manage. |
| AWS Lambda | Cold starts run hundreds of ms; adjacent services (DynamoDB, SQS, Pinecone) are separate billing/auth surfaces. More moving parts. |
| Long-running Node server (Fly, Render, Railway) | Need to run a server, monitor it, scale it. Loses the auto-scale-to-zero property. |
| Local-only (no deploy) | Defers the “does this work in production” question. Bad for a multi-tenant platform play. |
The deciding factors were edge cold-start latency (an agent turn already takes 10-30 seconds; you can’t add a 2-second cold start on top) and bindings (the Vectorize binding is a Worker- only API; using Vectorize from elsewhere means the public REST API plus auth tokens — a strict downgrade).
What it costs
Section titled “What it costs”Cloudflare Workers Paid plan: $5/month flat for the account, which includes 10M requests + 30M CPU-ms per month. Phase 1’s demo workload uses well under 0.1% of that.
CPU time is the only meaningful axis. Each agent turn is ~100ms of CPU (most of the wall time is awaiting upstream LLM/Shopify APIs, which doesn’t count). At the Workers Paid tier, CPU is free up to the cap and $0.02 per million CPU-ms after. Even at millions of agent turns per month the bill stays trivial.
What it replaces
Section titled “What it replaces”If this were a traditional architecture:
- An EC2 / GCE / Hetzner box running Node, plus monitoring
- An auto-scaler in front of it
- A separate vector DB cluster (Pinecone / Weaviate)
- A separate message broker (RabbitMQ / SQS)
- A separate cron service or in-process scheduler
- A separate KV cache (Redis) for working memory
- A load balancer
That stack is what most “production AI agent” tutorials assume. Workers + the surrounding Cloudflare services replace all of it with one bundle, one deploy, one bill.
Where to look
Section titled “Where to look”apps/worker/wrangler.toml— the deploy config; lists every binding the Worker holdsapps/worker/src/index.ts— the route table and handler registration; entry points for all three handler typesapps/worker/README.md— the operational runbook
Trade-offs we accepted
Section titled “Trade-offs we accepted”- Wall-time budget. Workers requests have a 30-second
wall-time limit on the Paid plan. Most agent turns fit
comfortably; the long-tail ones (full delegation chains,
multi-step Shopify lookups) can run up against it. The
platform’s response is the
/jobsasync path: when a turn needs more than 30 seconds, it gets dispatched to a Durable Object that runs to completion in an alarm handler. See Durable Objects. - No long-lived state in the Worker itself. Workers are request-scoped; nothing persists in the Worker between requests. Working memory and long-term memory both live in external storage (D1 + Vectorize + Durable Object storage).
- Bundle size limit. 1 MiB compressed; today the Worker is at 222 KiB gzipped (22% of budget), so plenty of headroom. Bundle size is tracked as follow-up #9.