ADR-0029: Embedding-provider abstraction

Status: Accepted Date: 2026-04-29 Related: ADR-0019 (LLM adapter), ADR-0027 (storage primitives — long-term memory = Vectorize + D1), ADR-0028 (chat-provider abstraction; embeddings split here)

Context

ADR-0028 confirmed that embeddings are out of scope for ModelAdapter: no streaming, no tool calls, no system prompt, batch as the primary mode, often a different provider from chat. They get their own EmbeddingAdapter.

ADR-0027 committed the long-term memory storage shape: D1 holds the source text and metadata; Vectorize holds the embedding; retrieval joins on ID. That ADR explicitly deferred the embedding model and provider question to here.

The long-term memory subsystem cannot ship without this decision. This ADR settles three things: the EmbeddingAdapter interface, the v1 starter provider, and how model identity is tracked alongside every vector so that switching providers later is a designed migration rather than a panic.

The non-obvious thing this ADR has to design around

Embeddings are not portable across models. A vector produced by text-embedding-3-small (1536 dim) cannot be compared, queried, or mixed with a vector from voyage-3 (1024 dim) or from text-embedding-3-large (3072 dim). They live in different semantic spaces. Even within OpenAI’s lineup, switching from text-embedding-3-small to text-embedding-3-large invalidates every existing vector — they’re as incompatible as switching providers.

This has cascading implications:

Every vector is bound to a specific (provider, model, dimension) tuple. That tuple has to be stored alongside the vector and never assumed.
The Vectorize index itself is bound to a single embedding configuration. Vectorize requires you to declare dimensions at index creation; you cannot put 1536-dim and 1024-dim vectors in the same index. Switching embedding models means creating a new index, re-embedding all source rows from D1 into the new index, then cutting over.
Multi-tenancy gets complicated if tenants are on different embedding configurations. Either everyone shares one index (one configuration platform-wide) or each tenant gets their own index. A single shared index is much simpler operationally but means a platform-wide embedding migration touches every tenant simultaneously. v1 commits to platform-wide; per-tenant overrides are deferred.
Cross-tenant retrieval is impossible across configurations. If one tenant is on Voyage and another is on OpenAI, there’s no meaningful way to search across them. We don’t currently want to do this. Flagging that the option is foreclosed.
Cost of being wrong is high. Picking the wrong starter embedding provider means a re-embedding job before the platform has many memories — manageable. Discovering it three months in with millions of stored vectors — painful but doable. Discovering it across many tenants with different embedding configs — much worse.

The ADR has to make this manageable, not invisible.

Provider candidates considered

Provider	Models	Dim	Notes
OpenAI	`text-embedding-3-small`, `text-embedding-3-large`, `ada-002`	1536 / 3072 / 1536	Industry baseline, well-benchmarked, supports Matryoshka truncation on `-3-*` (truncate dim without re-embedding), API stable, separate billing from chat, solid multilingual coverage.
Voyage AI	`voyage-3`, `voyage-3-large`, `voyage-code-3`	1024 / 1024 / 1024	Strong on retrieval benchmarks, designed for RAG, smaller dim = cheaper Vectorize storage. Less battle-tested than OpenAI.
Cohere	`embed-v3` (English / multilingual)	1024	Solid retrieval performance, good multilingual story (relevant: Ganimarka and vetzoo are Swedish).
Anthropic	none currently	—	Anthropic does not offer an embedding API as of this writing. Worth tracking but not selectable today.
Workers AI	`@cf/baai/bge-base-en-v1.5`, `@cf/baai/bge-large-en-v1.5`, etc.	768 / 1024	Cloudflare-native — zero egress, low latency, no auth ceremony. Smaller models than the cloud APIs; English-leaning unless using a multilingual variant.
Local (sentence-transformers, etc.)	many	varies	Requires lifecycle management (covered by ADR-0028’s optional `init()`/`dispose()` pattern). Not viable in Workers runtime.

Decision

Interface: `EmbeddingAdapter`

Separate from ModelAdapter. Public interface:

interface EmbeddingAdapter {
  /** Stable identifier: provider name, e.g. "openai", "voyage", "workers-ai". */
  readonly provider: string;

  /** Specific model in use, e.g. "text-embedding-3-small". */
  readonly model: string;

  /** Vector dimensionality. Static per (provider, model). */
  readonly dimensions: number;

  /** Maximum input tokens per item the model accepts. */
  readonly maxInputTokens: number;

  /** Maximum number of items per batch call. */
  readonly maxBatchSize: number;

  /**
   * Embed one or more inputs. Batch is the primary mode.
   * Returns vectors in the same order as inputs.
   */
  embed(inputs: string[]): Promise<EmbeddingResult>;

  /** Optional lifecycle hooks for adapters that need them (mirrors ADR-0028). */
  init?(): Promise<void>;
  dispose?(): Promise<void>;
}

interface EmbeddingResult {
  vectors: number[][];
  /** Token usage for cost tracking (optional — not all providers expose it). */
  usage?: { inputTokens: number };
  /** The (provider, model, dimensions) tuple this batch was produced under. */
  config: EmbeddingConfig;
}

interface EmbeddingConfig {
  provider: string;
  model: string;
  dimensions: number;
}

Shared with ModelAdapter:

LLMError taxonomy from ADR-0028 (RateLimitError, OverloadedError, InvalidRequestError, AuthError, TransientError).
Optional init() / dispose() lifecycle pattern.
Auth/retry infrastructure at the implementation level, not in the public interface.

Deliberately not shared:

No StopReason, no generateStream(), no getCapabilities(). Embeddings are a different operation; collapsing them into one interface would force most fields to be unused on every embedding call.

Model identity is metadata on every vector

Every vector stored in Vectorize and every long-term-memory row in D1 carries the EmbeddingConfig it was produced with:

D1 schema for long_term_memory has columns embedding_provider, embedding_model, embedding_dimensions.
Vectorize metadata on each vector mirrors these.
The retrieval pipeline asserts that the query embedding’s config matches the index’s config; mismatch is a hard error, not a silent miscompare.

This is the single most important design decision in this ADR. It costs three columns of D1 schema and three metadata fields on each vector. It buys: detectable mismatches, replayable re-embeddings (D1 source text is the ground truth), and a clean migration path when the embedding model changes.

One Vectorize index per embedding configuration

Vectorize indexes are bound to dimensionality at creation. The platform commits to one index per active embedding configuration, named lt-memory-<provider>-<model> — for the v1 starter (below), the index name is lt-memory-openai-text-embedding-3-small.

The lt- prefix disambiguates long-term memory indexes from any other vector indexes the platform may add later (e.g. caches, retrieval-augmented tool descriptions).

For v1, the platform runs a single embedding configuration platform-wide, so a single index. Multi-config support — different tenants on different providers, A/B testing two embedding models, gradual migration with two indexes coexisting — is preserved by this naming scheme but not built. The runtime knows which index to query by reading the EmbeddingConfig declared in platform config.

Platform-wide configuration; per-tenant deferred

v1 is one embedding configuration shared by all tenants. Per-tenant overrides are deferred until a second non-Swedish-e-commerce tenant arrives, or until a real workload-specific need emerges. The interface and naming scheme make per-tenant overrides additive when needed.

Migration is a designed operation, not built yet

The ADR commits to making re-embedding possible, not to building the tool now:

D1 source-of-truth invariant from ADR-0027 means re-embedding never loses information — every vector has a corresponding D1 row whose text can be re-embedded.
The EmbeddingConfig metadata makes it possible to identify which vectors belong to which configuration during a migration window.
A future re-embedding job iterates D1 rows, embeds with the new config, writes to a new Vectorize index, and once validated, the runtime config flips to point at the new index.

This is a Phase 2 or later operational concern. For now: the foundation supports it, the tooling doesn’t exist.

Starter provider: OpenAI `text-embedding-3-small`

For v1, the long-term memory subsystem starts on OpenAI text-embedding-3-small (1536 dimensions).

Reasoning, in priority order:

Multilingual coverage matters now, not later. Ganimarka and the planned vetzoo onboarding are Swedish. The product descriptions, customer messages, and operator notes will not be exclusively English. Starting on a model with poor multilingual support means the first symptom of trouble is “long-term memory feels broken” — vague, hard to debug, eroding trust in the platform. OpenAI’s -3-* line handles multilingual content acceptably.
Industry-standard, best-documented, most stable API. The right starting surface to learn what the platform actually wants from an embedding provider before optimizing.
Cost is not meaningful at v1 volumes. Embedding pricing is a small fraction of chat-completion pricing; expected monthly cost is single-digit dollars at current memory volumes.
Matryoshka truncation is supported on -3-* models — vectors can be stored at full dimension and queried at truncated dimension (e.g., 512 or 768) without re-embedding. This is a real future-flexibility benefit that BGE and Voyage do not offer.
Switching cost is symmetric. A future migration from OpenAI to Workers AI BGE, or to Voyage, costs the same as a migration from BGE to OpenAI — a re-embedding job over D1 source text. Starting on OpenAI does not increase lock-in versus starting elsewhere; the migration foundation is the same.

Considered and rejected for v1:

Workers AI @cf/baai/bge-base-en-v1.5. Operationally simpler (no API key, no egress, in-isolate latency) and was the initial recommendation in this ADR’s drafting. Rejected because BGE-base is English-leaning and the customer base is Swedish-language. The “long-term memory feels broken” failure mode on multilingual content is harder to debug than the “OpenAI rate-limited us” failure mode. Workers AI BGE remains a perfectly reasonable migration target once we have measured workload data — better positioned to make that call having shipped on a known-good model than having shipped on a possibly-marginal one.
OpenAI text-embedding-3-large (3072 dim). Marginally better recall, 2× the storage cost, slower inference. Overkill for v1 volumes. Revisit if recall is measured to be insufficient.
Voyage voyage-3 / Cohere embed-v3-multilingual. Both plausible. Voyage benchmarks well; Cohere has strong multilingual. Rejected on “less battle-tested than OpenAI” — better to start on the most predictable surface and switch if a measured need emerges.
Anthropic. Does not offer an embedding API as of this writing.

Acknowledged costs of choosing OpenAI:

A third-party API call on the long-term-memory ingest and retrieval paths. Network egress and roundtrip latency we wouldn’t pay with Workers AI.
A new Worker secret (OPENAI_API_KEY) to manage.
A new billing relationship to monitor.
Exposure to OpenAI’s rate limits and outages.

These are bounded, well-understood costs. None of them is novel.

Consequences

Becomes easy:

Long-term memory subsystem can ship: clear interface, clear starter provider, clear storage shape (per ADR-0027).
Switching embedding providers later is a designed migration, not a system replacement.
Detecting embedding-config mismatches at retrieval is structural, not best-effort.
Adding a second embedding adapter (e.g., WorkersAIEmbeddingAdapter, VoyageEmbeddingAdapter) is roughly a one-day commit when there’s a reason — same shape as ADR-0028’s pattern for chat adapters.

Becomes hard / accepted tradeoffs:

Long-term memory ingest and retrieval paths now include a third-party network call. Not on the synchronous request path for the merchandising agent (which is async), but will be for any future sync-response agent that retrieves memories.
New OpenAI billing relationship and rate-limit exposure.
Migration tooling has to be built before the first real model swap. Not now.
Every vector pays a small storage cost for the EmbeddingConfig metadata. Trivial.

Explicitly deferred:

Concrete EmbeddingAdapter implementations beyond OpenAIEmbeddingAdapter. (Workers AI, Voyage, Cohere come when there’s a reason — recall complaint, latency complaint, cost concern, or a customer constraint.)
Re-embedding migration tooling.
Tenant-level embedding-configuration overrides.
Hybrid search (semantic + keyword), reranking, MMR.
Query-time embedding caching (the same query embedded repeatedly).
Multi-vector / late-interaction models (ColBERT-style).
Embedding for non-text modalities (images, audio).
Matryoshka truncation as a runtime feature. The model supports it; the platform does not exploit it for v1.

Trigger conditions for revisit

OpenAI rate limits or outage exposure becomes a real operational pain → add WorkersAIEmbeddingAdapter as fallback or migration target.
Long-term memory retrieval latency dominates a sync request path → consider Workers AI for in-isolate latency, or Matryoshka truncation for faster queries.
Recall quality measured to be insufficient on real workloads → benchmark text-embedding-3-large, Voyage, Cohere; plan migration.
A second tenant onboards with a materially different content domain (e.g., code, audio transcripts) → consider per-tenant embedding configuration; build the override mechanism.
OpenAI cost becomes meaningful at scale (unlikely soon) → migrate to Workers AI or self-hosted.
Anthropic ships a competitive embedding API → consider migrating for vendor consolidation.

Implementation plan (for follow-up commits, not this ADR)

In rough order, each its own commit:

New package packages/embeddings: EmbeddingAdapter interface, EmbeddingConfig type, MockEmbeddingAdapter for tests.
New package packages/embeddings-openai: OpenAIEmbeddingAdapter implementing the interface against OpenAI’s /v1/embeddings endpoint. Tests against the mock and (gated) integration tests against the live API.
D1 migration: add embedding_provider, embedding_model, embedding_dimensions columns to whatever long_term_memory schema lands. Vectorize index lt-memory-openai-text-embedding-3-small created at deploy.
Long-term memory subsystem itself — its own ADR if the design has non-trivial decisions, or just an implementation if the design is straightforward by then.

Worker secret to add at deploy: OPENAI_API_KEY.

ADR-0029: Embedding-provider abstraction

ADR-0029: Embedding-provider abstraction

Context

The non-obvious thing this ADR has to design around

Provider candidates considered

Decision

Interface: EmbeddingAdapter

Model identity is metadata on every vector

One Vectorize index per embedding configuration

Platform-wide configuration; per-tenant deferred

Migration is a designed operation, not built yet

Starter provider: OpenAI text-embedding-3-small

Consequences

Trigger conditions for revisit

Implementation plan (for follow-up commits, not this ADR)

Interface: `EmbeddingAdapter`

Starter provider: OpenAI `text-embedding-3-small`