Skip to content

Cloudflare Workers

The platform’s main runtime. Every HTTP request, every cron tick, every queue message that the platform handles runs inside a Cloudflare Worker.

Three handler types in one Worker (apps/worker):

  • fetch — the HTTP API. Routes for /health, /run (synchronous agent turns), /jobs (async submission + polling), /admin/seed-memory. Bearer-token auth on everything except /health.
  • scheduled — cron triggers. Today the only cron is the weekly merchandising agent at Mondays 06:00 UTC. Cron-driven runs bypass auth (Cloudflare invokes the handler directly).
  • queue — consumer for the async event bus. Reads from the human-review and shopify-actions queues; today logs-only, Phase 2 will wire real mutations.

A single Worker handles all three. One deploy, three entry points.

The decision was about where the agent runtime lives. The real options were:

OptionVerdict
Cloudflare WorkersChosen. Edge cold start <10ms; native bindings to D1, Vectorize, Queues, Durable Objects, KV, R2. No infrastructure to manage.
AWS LambdaCold starts run hundreds of ms; adjacent services (DynamoDB, SQS, Pinecone) are separate billing/auth surfaces. More moving parts.
Long-running Node server (Fly, Render, Railway)Need to run a server, monitor it, scale it. Loses the auto-scale-to-zero property.
Local-only (no deploy)Defers the “does this work in production” question. Bad for a multi-tenant platform play.

The deciding factors were edge cold-start latency (an agent turn already takes 10-30 seconds; you can’t add a 2-second cold start on top) and bindings (the Vectorize binding is a Worker- only API; using Vectorize from elsewhere means the public REST API plus auth tokens — a strict downgrade).

Cloudflare Workers Paid plan: $5/month flat for the account, which includes 10M requests + 30M CPU-ms per month. Phase 1’s demo workload uses well under 0.1% of that.

CPU time is the only meaningful axis. Each agent turn is ~100ms of CPU (most of the wall time is awaiting upstream LLM/Shopify APIs, which doesn’t count). At the Workers Paid tier, CPU is free up to the cap and $0.02 per million CPU-ms after. Even at millions of agent turns per month the bill stays trivial.

If this were a traditional architecture:

  • An EC2 / GCE / Hetzner box running Node, plus monitoring
  • An auto-scaler in front of it
  • A separate vector DB cluster (Pinecone / Weaviate)
  • A separate message broker (RabbitMQ / SQS)
  • A separate cron service or in-process scheduler
  • A separate KV cache (Redis) for working memory
  • A load balancer

That stack is what most “production AI agent” tutorials assume. Workers + the surrounding Cloudflare services replace all of it with one bundle, one deploy, one bill.

  • apps/worker/wrangler.toml — the deploy config; lists every binding the Worker holds
  • apps/worker/src/index.ts — the route table and handler registration; entry points for all three handler types
  • apps/worker/README.md — the operational runbook
  • Wall-time budget. Workers requests have a 30-second wall-time limit on the Paid plan. Most agent turns fit comfortably; the long-tail ones (full delegation chains, multi-step Shopify lookups) can run up against it. The platform’s response is the /jobs async path: when a turn needs more than 30 seconds, it gets dispatched to a Durable Object that runs to completion in an alarm handler. See Durable Objects.
  • No long-lived state in the Worker itself. Workers are request-scoped; nothing persists in the Worker between requests. Working memory and long-term memory both live in external storage (D1 + Vectorize + Durable Object storage).
  • Bundle size limit. 1 MiB compressed; today the Worker is at 222 KiB gzipped (22% of budget), so plenty of headroom. Bundle size is tracked as follow-up #9.