Cloudflare Vectorize

Cloudflare’s managed vector index. Used for semantic recall — when an agent calls recall_memory("refund history for sara@example.com"), the platform embeds the query, searches Vectorize for the top-K nearest matches, and hydrates the full content from D1.

What we use it for

One index named agent-platform-lt-memory:

Dimensions: 1536 (matches OpenAI’s text-embedding-3-small)
Metric: cosine similarity
Metadata indexes: tenant_id (string) and agent_id (string)

Each Vectorize entry has the same id as the corresponding D1 row. The metadata fields are the only filterable surface; everything else lives in D1 for the rich query layer.

The metadata indexes are structural: a query for agent_id = 'agent-refund-decision', tenant_id = 'default' is filtered by Vectorize before similarity search runs. This is the mechanism that prevents one agent’s memories from bleeding into another’s.

Why we picked it

The choice was: where do we put the vector index for semantic recall?

Option	Verdict
Cloudflare Vectorize	Chosen. Worker binding (no REST API), free tier covers Phase 1, metadata filtering, no infrastructure.
Pinecone	Mature; great query language. But it’s an external service: REST API, separate auth, separate billing, separate latency profile.
Weaviate (self-hosted)	Strong feature set but we’d run a cluster. Loses the “no infra” property.
pgvector on Postgres	Single-store appeal (D1’s vector cousin), but pgvector requires Postgres; Workers + D1 don’t have it natively.
In-memory FAISS in the Worker	Works for a few hundred vectors; falls over at any real scale. Resets every cold start.

Vectorize won because it’s the closest fit to “vector search via a Worker binding, no setup.” See ADR-0030 for the full long-term-memory access pattern.

What it costs

Vectorize free tier (included with Workers Paid):

30M queried vector dimensions per month
5M stored vector dimensions

At 1536 dimensions per entry, 5M stored ÷ 1536 ≈ 3,250 stored entries in the free tier. Phase 1 has 10. We have multiple orders of magnitude before this matters.

Queried-dimensions: each top-K query costs K × 1536 queried dimensions (we query top-20 typically). 30M ÷ (20 × 1536) ≈ 975 queries per month in free tier. At Phase 1’s demo scale this is fine; at production volume we’d be on the paid tier within weeks.

After free tier: $0.04 per million queried dimensions, $0.05 per 100M stored dimensions per month.

What it replaces

A dedicated vector DB cluster (Pinecone, Weaviate, Qdrant) with its own auth, network hop, monitoring, billing, and connection pooling. Vectorize reduces this to a binding declaration plus an embedding-format-and-metric configuration done once at index creation.

Where to look

packages/memory/src/vectorize-backed-long-term-memory.ts — the gateway that turns a recall_memory call into a Vectorize search + D1 hydrate
packages/memory/src/long-term-memory-storage.ts — the Vectorize-binding interface the gateway uses
apps/worker/wrangler.toml — the [[vectorize]] binding

Trade-offs we accepted

No delete-by-metadata. As of mid-2026, Vectorize doesn’t support DELETE WHERE metadata.tenant_id = ?. Wiping a tenant’s vectors requires deleting the entire index and recreating it (which deletes ALL tenants). Acceptable for v1 single-tenant; tracked as follow-up #11. Soft-delete via metadata flag is a workaround if we need it before CF ships the feature.
Metadata indexes can’t be added retroactively. Indexes must be declared before any vector is inserted. Adding an index after the fact requires recreating the index. Phase 1’s setup was a 4-step ritual: create index → create tenant_id metadata index → create agent_id metadata index → start inserting. Documented in the worker README.
Eventually-consistent metadata index propagation. Newly created metadata indexes take a few hundred milliseconds to process (visible as processedUpToMutation in wrangler vectorize info). For a deploy-time setup this is fine.