Cloudflare D1

Cloudflare’s edge-replicated SQLite. Used as the source-of-truth row store for long-term memory, paired with Vectorize for the search index.

What we use it for

A single table in a single database (agent_platform):

CREATE TABLE long_term_memory (
  id TEXT PRIMARY KEY,             -- ULID
  tenant_id TEXT NOT NULL,         -- structural tenant scope
  agent_id TEXT NOT NULL,          -- which agent this memory belongs to
  content TEXT NOT NULL,           -- the actual memory text
  metadata TEXT NOT NULL,          -- JSON-encoded
  embedding_provider TEXT,         -- e.g. 'openai'
  embedding_model TEXT,            -- e.g. 'text-embedding-3-small'
  embedding_dimensions INTEGER,
  created_at TEXT NOT NULL
);

Every long-term memory entry exists as a row here. The vector embedding for the same entry lives in Vectorize. An entry’s id is the same in both stores — Vectorize’s metadata holds just tenant_id and agent_id for filtering; the full content is hydrated from D1 after the vector search returns the matching IDs.

Why we picked it

The choice was: where do we keep the source-of-truth row data for long-term memory?

Option	Verdict
Cloudflare D1	Chosen. Native Worker binding (no auth/network hop), SQL semantics, free-tier covers Phase 1 and well into Phase 2, edge-replicated for read latency.
Vectorize alone (store content in vector metadata)	Vectorize metadata has size limits and isn’t designed as a primary store. Loses queryability.
External SQL (Postgres on Neon / Supabase / RDS)	Adds a network hop from the Worker, separate auth, separate billing. Loses the no-infra story.
Cloudflare KV	Key-value, no queries. Doesn’t fit a row store with metadata filters.
R2 (object storage)	Wrong shape entirely; would need a query layer on top.

D1 won because it’s the closest fit to “I want a relational table available from the Worker with no setup.” See ADR-0032 for the full storage-primitives decision.

What it costs

D1 free tier (included with Workers Paid):

5 GB total storage
5M rows read per day
100K rows written per day

Phase 1’s seed inserts 10 rows. Each agent run reads at most a handful (the matches from a Vectorize search, hydrated from D1). Tens of thousands of agent runs per day stay within free tier comfortably.

After free tier: $0.75 per GB-month storage; $0.001 per 1K rows read; $1.00 per 1M rows written.

What it replaces

A dedicated relational database — Postgres on Neon, RDS, or similar — with its own connection pool, auth, monitoring, and billing. D1 reduces this to a binding declaration in wrangler.toml and a SQL migration file in apps/worker/migrations/.

Where to look

apps/worker/migrations/0001_long_term_memory.sql — the schema migration; only one migration today
packages/memory/src/long-term-memory-storage.ts — the D1 access layer (typed prepared statements; no ORM)
apps/worker/wrangler.toml — the [[d1_databases]] binding

Trade-offs we accepted

No ORM. We use raw prepared statements. Drizzle was considered but adds a dependency for one table. Will revisit if the schema grows past ~5 tables.
Edge replication is read-only. Writes go to a primary region (set at index creation; ours is EEUR per the deploy log). Read replicas in other regions catch up within milliseconds to seconds. For a memory subsystem, eventual consistency on reads is fine — a recall query that misses a just-written entry simply doesn’t surface it on this turn.
No delete-by-metadata. Like Vectorize, D1 doesn’t have a declarative metadata index for fast deletes. We use plain DELETE WHERE tenant_id = ?, which scans on tenant_id. The tenant_id column is indexed; this is fine for tens of thousands of rows. Will need re-architecture at multi-million- row scale, which is far away.