Skip to content

Cloudflare Queues

Cloudflare’s managed message queue service. Used as the platform’s event bus — when an agent decides something needs to happen (escalate to a human, refund an order), it doesn’t act directly; it emits an event onto a queue, and a separate consumer picks it up.

This separation is the platform’s central safety property. The agent recommends; the consumer acts. A human approval gate fits naturally between them.

Two queues:

  • human-review — events that need a human eye. Today the consumer logs them; Phase 2 wires a real review UI.
  • shopify-actions — Shopify mutations the platform wants to make (refund, cancel, annotate). Today the consumer logs them; Phase 2 executes them.

Each topic has a typed Zod schema (in packages/event-bus/src/topics/) that the producer side validates against before publishing. The consumer side re-validates on receipt. Mismatches bounce to the consumer’s dead-letter handler.

A single Worker is both producer (via the queue binding’s send) and consumer (via the queue handler). One deploy handles both sides.

The choice was: how do agents and consumers communicate asynchronously?

OptionVerdict
Cloudflare QueuesChosen. Native Worker binding for both producer and consumer; per-message ack/retry semantics; included in Workers Paid.
AWS SQS / EventBridgeAdds REST API + auth tokens + a separate billing surface. Loses the binding-only ergonomics.
RabbitMQ self-hostedWe’d run a cluster. Wrong scale for a platform-as-a-service play.
In-process events / direct callsDefeats the safety property. The agent and the action would share a transaction; a buggy agent could cause real-world side effects.
Cloudflare WorkflowsStrong fit for orchestrated workflows but heavier than what we need; queues are right-sized.

Queues won because they make the producer and consumer fully decoupled (different deploy units in principle; same Worker in practice today) without adding any external service. See ADR-0032 for the full async-coordination decision.

Cloudflare Queues free tier (included with Workers Paid):

  • 1M operations per month (publish + consume each count)

Phase 1’s demo emits at most 1-2 events per agent run. Tens of thousands of runs per month stay in free tier.

After free tier: $0.40 per million operations.

A dedicated message broker (RabbitMQ, AWS SQS, GCP Pub/Sub) with its own auth, network hop, monitoring, and billing. Queues reduces this to a [[queues.producers]] and [[queues.consumers]] declaration in wrangler.toml.

  • packages/event-bus/ — the EventBus interface plus the Cloudflare Queues implementation and a Mock for tests
  • apps/worker/src/queue-consumer.ts — the consumer entry point that reads off both queues
  • apps/worker/wrangler.toml — the [[queues.producers]] and [[queues.consumers]] blocks
  • At-least-once delivery, not exactly-once. A message might be delivered to the consumer more than once if the consumer fails after processing but before acking. Idempotency is the consumer’s responsibility — Phase 2’s Shopify mutations will need an idempotency key per event.id. Tracked as part of the Phase 2 design.
  • No DLQ + retry policy yet. Failed messages today just bubble up; Phase 1’s consumer is logs-only so there’s nothing to retry. Phase 2 wires the DLQ — tracked as follow-up #10.
  • Consumer batching. Each invocation receives a batch of messages; the consumer must process all of them and ack individually. Already wired in apps/worker/src/queue-consumer.ts.