OpenAI
OpenAI is used for embeddings only — the vector representations that back semantic search in long-term memory. Agent reasoning runs on Anthropic; embeddings on OpenAI.
This split is deliberate. Different providers, different problems, different abstractions in the codebase.
What we use it for
Section titled “What we use it for”One model: text-embedding-3-small. 1536 dimensions, cosine-friendly.
Whenever an agent calls recall_memory("..."), the runtime:
- Calls OpenAI’s embeddings endpoint with the query text
- Receives back a 1536-dim vector
- Searches Vectorize using that vector
- Hydrates the matches from D1
- Returns the result to the agent
Every long-term memory entry was also embedded at write time — the seed handler invokes the embedder for each entry it stores.
Why we picked it
Section titled “Why we picked it”The decision was: which embedding provider?
| Option | Verdict |
|---|---|
| OpenAI text-embedding-3-small | Chosen. Strong quality at 1536 dims; cheap; production-tested; well-documented. |
| OpenAI text-embedding-3-large | Higher quality at 3072 dims, but doubles Vectorize cost (each stored vector is twice as large). Not worth it for Phase 1’s recall quality bar. |
| Anthropic embeddings | Anthropic doesn’t offer embeddings as a first-party API today. |
| Cohere embed-v3 | Strong alternative. Adds a second AI vendor for similar quality. |
| Self-hosted (sentence-transformers, BGE) | Higher operational burden for small quality wins. |
text-embedding-3-small won as the obvious cheap-and-good
default. The platform’s EmbeddingAdapter interface (in
packages/embeddings/) means swapping providers is a one-file
change. ADR-0029 covers the abstraction; ADR-0030 covers the
recall pattern.
What it costs
Section titled “What it costs”OpenAI embeddings pricing:
| Model | Per million tokens |
|---|---|
| text-embedding-3-small | $0.02 |
| text-embedding-3-large | $0.13 |
Each Phase 1 seed entry is ~50-100 tokens; 10 entries × 100 tokens × $0.02/1M ≈ $0.00002 per full seed run. Each recall query is ~10-20 tokens × $0.02/1M ≈ $0.0000004 per query.
Embedding costs are essentially free at our scale.
What it replaces
Section titled “What it replaces”A self-hosted embedding service (e.g. running sentence-transformers
on a GPU box). For an embedding model trained on the public web,
running our own would be silly — the API is cheaper than the
electricity bill.
Where to look
Section titled “Where to look”packages/embeddings/— the provider-agnosticEmbeddingAdapterinterface, plus aMockEmbeddingAdapterfor testspackages/embeddings-openai/— the OpenAI implementation; uses rawfetch(no SDK dependency); translates errors into the@agent-platform/llmerror taxonomyapps/worker/src/gateways.ts— wires the OpenAI adapter into the long-term memory subsystem
Trade-offs we accepted
Section titled “Trade-offs we accepted”- Vendor concentration on embeddings. If OpenAI has an
outage, recall fails. The
EmbeddingAdapterabstraction means we can swap providers, but we don’t do that automatically. - Direct fetch, no SDK. The OpenAI SDK has its own quirks (Node-only assumptions, large dep graph). On Cloudflare Workers, raw fetch is cleaner. Trade-off: we re-implement small bits (request shape, error translation) instead of getting them free. Worth it to avoid the SDK.
- One bug class we hit at deploy time. The Workers fetch this-binding gotcha — fixed in commit 11. Documented as cautionary tale in the embeddings adapter. Same bug class as the one we hit in the Shopify client previously.