Skip to content

Cloudflare Vectorize

Cloudflare’s managed vector index. Used for semantic recall — when an agent calls recall_memory("refund history for sara@example.com"), the platform embeds the query, searches Vectorize for the top-K nearest matches, and hydrates the full content from D1.

One index named agent-platform-lt-memory:

  • Dimensions: 1536 (matches OpenAI’s text-embedding-3-small)
  • Metric: cosine similarity
  • Metadata indexes: tenant_id (string) and agent_id (string)

Each Vectorize entry has the same id as the corresponding D1 row. The metadata fields are the only filterable surface; everything else lives in D1 for the rich query layer.

The metadata indexes are structural: a query for agent_id = 'agent-refund-decision', tenant_id = 'default' is filtered by Vectorize before similarity search runs. This is the mechanism that prevents one agent’s memories from bleeding into another’s.

The choice was: where do we put the vector index for semantic recall?

OptionVerdict
Cloudflare VectorizeChosen. Worker binding (no REST API), free tier covers Phase 1, metadata filtering, no infrastructure.
PineconeMature; great query language. But it’s an external service: REST API, separate auth, separate billing, separate latency profile.
Weaviate (self-hosted)Strong feature set but we’d run a cluster. Loses the “no infra” property.
pgvector on PostgresSingle-store appeal (D1’s vector cousin), but pgvector requires Postgres; Workers + D1 don’t have it natively.
In-memory FAISS in the WorkerWorks for a few hundred vectors; falls over at any real scale. Resets every cold start.

Vectorize won because it’s the closest fit to “vector search via a Worker binding, no setup.” See ADR-0030 for the full long-term-memory access pattern.

Vectorize free tier (included with Workers Paid):

  • 30M queried vector dimensions per month
  • 5M stored vector dimensions

At 1536 dimensions per entry, 5M stored ÷ 1536 ≈ 3,250 stored entries in the free tier. Phase 1 has 10. We have multiple orders of magnitude before this matters.

Queried-dimensions: each top-K query costs K × 1536 queried dimensions (we query top-20 typically). 30M ÷ (20 × 1536) ≈ 975 queries per month in free tier. At Phase 1’s demo scale this is fine; at production volume we’d be on the paid tier within weeks.

After free tier: $0.04 per million queried dimensions, $0.05 per 100M stored dimensions per month.

A dedicated vector DB cluster (Pinecone, Weaviate, Qdrant) with its own auth, network hop, monitoring, billing, and connection pooling. Vectorize reduces this to a binding declaration plus an embedding-format-and-metric configuration done once at index creation.

  • packages/memory/src/vectorize-backed-long-term-memory.ts — the gateway that turns a recall_memory call into a Vectorize search + D1 hydrate
  • packages/memory/src/long-term-memory-storage.ts — the Vectorize-binding interface the gateway uses
  • apps/worker/wrangler.toml — the [[vectorize]] binding
  • No delete-by-metadata. As of mid-2026, Vectorize doesn’t support DELETE WHERE metadata.tenant_id = ?. Wiping a tenant’s vectors requires deleting the entire index and recreating it (which deletes ALL tenants). Acceptable for v1 single-tenant; tracked as follow-up #11. Soft-delete via metadata flag is a workaround if we need it before CF ships the feature.
  • Metadata indexes can’t be added retroactively. Indexes must be declared before any vector is inserted. Adding an index after the fact requires recreating the index. Phase 1’s setup was a 4-step ritual: create index → create tenant_id metadata index → create agent_id metadata index → start inserting. Documented in the worker README.
  • Eventually-consistent metadata index propagation. Newly created metadata indexes take a few hundred milliseconds to process (visible as processedUpToMutation in wrangler vectorize info). For a deploy-time setup this is fine.