Cloudflare Vectorize
Cloudflare’s managed vector index. Used for semantic recall —
when an agent calls recall_memory("refund history for sara@example.com"),
the platform embeds the query, searches Vectorize for the top-K
nearest matches, and hydrates the full content from D1.
What we use it for
Section titled “What we use it for”One index named agent-platform-lt-memory:
- Dimensions: 1536 (matches OpenAI’s
text-embedding-3-small) - Metric: cosine similarity
- Metadata indexes:
tenant_id(string) andagent_id(string)
Each Vectorize entry has the same id as the corresponding D1 row.
The metadata fields are the only filterable surface; everything
else lives in D1 for the rich query layer.
The metadata indexes are structural: a query for
agent_id = 'agent-refund-decision', tenant_id = 'default' is
filtered by Vectorize before similarity search runs. This is the
mechanism that prevents one agent’s memories from bleeding into
another’s.
Why we picked it
Section titled “Why we picked it”The choice was: where do we put the vector index for semantic recall?
| Option | Verdict |
|---|---|
| Cloudflare Vectorize | Chosen. Worker binding (no REST API), free tier covers Phase 1, metadata filtering, no infrastructure. |
| Pinecone | Mature; great query language. But it’s an external service: REST API, separate auth, separate billing, separate latency profile. |
| Weaviate (self-hosted) | Strong feature set but we’d run a cluster. Loses the “no infra” property. |
| pgvector on Postgres | Single-store appeal (D1’s vector cousin), but pgvector requires Postgres; Workers + D1 don’t have it natively. |
| In-memory FAISS in the Worker | Works for a few hundred vectors; falls over at any real scale. Resets every cold start. |
Vectorize won because it’s the closest fit to “vector search via a Worker binding, no setup.” See ADR-0030 for the full long-term-memory access pattern.
What it costs
Section titled “What it costs”Vectorize free tier (included with Workers Paid):
- 30M queried vector dimensions per month
- 5M stored vector dimensions
At 1536 dimensions per entry, 5M stored ÷ 1536 ≈ 3,250 stored entries in the free tier. Phase 1 has 10. We have multiple orders of magnitude before this matters.
Queried-dimensions: each top-K query costs K × 1536 queried
dimensions (we query top-20 typically). 30M ÷ (20 × 1536) ≈
975 queries per month in free tier. At Phase 1’s demo scale
this is fine; at production volume we’d be on the paid tier
within weeks.
After free tier: $0.04 per million queried dimensions, $0.05 per 100M stored dimensions per month.
What it replaces
Section titled “What it replaces”A dedicated vector DB cluster (Pinecone, Weaviate, Qdrant) with its own auth, network hop, monitoring, billing, and connection pooling. Vectorize reduces this to a binding declaration plus an embedding-format-and-metric configuration done once at index creation.
Where to look
Section titled “Where to look”packages/memory/src/vectorize-backed-long-term-memory.ts— the gateway that turns arecall_memorycall into a Vectorize search + D1 hydratepackages/memory/src/long-term-memory-storage.ts— the Vectorize-binding interface the gateway usesapps/worker/wrangler.toml— the[[vectorize]]binding
Trade-offs we accepted
Section titled “Trade-offs we accepted”- No delete-by-metadata. As of mid-2026, Vectorize doesn’t
support
DELETE WHERE metadata.tenant_id = ?. Wiping a tenant’s vectors requires deleting the entire index and recreating it (which deletes ALL tenants). Acceptable for v1 single-tenant; tracked as follow-up #11. Soft-delete via metadata flag is a workaround if we need it before CF ships the feature. - Metadata indexes can’t be added retroactively. Indexes
must be declared before any vector is inserted. Adding an
index after the fact requires recreating the index. Phase 1’s
setup was a 4-step ritual: create index → create
tenant_idmetadata index → createagent_idmetadata index → start inserting. Documented in the worker README. - Eventually-consistent metadata index propagation. Newly
created metadata indexes take a few hundred milliseconds to
process (visible as
processedUpToMutationinwrangler vectorize info). For a deploy-time setup this is fine.