OpenAI

OpenAI is used for embeddings only — the vector representations that back semantic search in long-term memory. Agent reasoning runs on Anthropic; embeddings on OpenAI.

This split is deliberate. Different providers, different problems, different abstractions in the codebase.

What we use it for

One model: text-embedding-3-small. 1536 dimensions, cosine-friendly.

Whenever an agent calls recall_memory("..."), the runtime:

Calls OpenAI’s embeddings endpoint with the query text
Receives back a 1536-dim vector
Searches Vectorize using that vector
Hydrates the matches from D1
Returns the result to the agent

Every long-term memory entry was also embedded at write time — the seed handler invokes the embedder for each entry it stores.

Why we picked it

The decision was: which embedding provider?

Option	Verdict
OpenAI text-embedding-3-small	Chosen. Strong quality at 1536 dims; cheap; production-tested; well-documented.
OpenAI text-embedding-3-large	Higher quality at 3072 dims, but doubles Vectorize cost (each stored vector is twice as large). Not worth it for Phase 1’s recall quality bar.
Anthropic embeddings	Anthropic doesn’t offer embeddings as a first-party API today.
Cohere embed-v3	Strong alternative. Adds a second AI vendor for similar quality.
Self-hosted (sentence-transformers, BGE)	Higher operational burden for small quality wins.

text-embedding-3-small won as the obvious cheap-and-good default. The platform’s EmbeddingAdapter interface (in packages/embeddings/) means swapping providers is a one-file change. ADR-0029 covers the abstraction; ADR-0030 covers the recall pattern.

What it costs

OpenAI embeddings pricing:

Model	Per million tokens
text-embedding-3-small	$0.02
text-embedding-3-large	$0.13

Each Phase 1 seed entry is ~50-100 tokens; 10 entries × 100 tokens × $0.02/1M ≈ $0.00002 per full seed run. Each recall query is ~10-20 tokens × $0.02/1M ≈ $0.0000004 per query.

Embedding costs are essentially free at our scale.

What it replaces

A self-hosted embedding service (e.g. running sentence-transformers on a GPU box). For an embedding model trained on the public web, running our own would be silly — the API is cheaper than the electricity bill.

Where to look

packages/embeddings/ — the provider-agnostic EmbeddingAdapter interface, plus a MockEmbeddingAdapter for tests
packages/embeddings-openai/ — the OpenAI implementation; uses raw fetch (no SDK dependency); translates errors into the @agent-platform/llm error taxonomy
apps/worker/src/gateways.ts — wires the OpenAI adapter into the long-term memory subsystem

Trade-offs we accepted

Vendor concentration on embeddings. If OpenAI has an outage, recall fails. The EmbeddingAdapter abstraction means we can swap providers, but we don’t do that automatically.
Direct fetch, no SDK. The OpenAI SDK has its own quirks (Node-only assumptions, large dep graph). On Cloudflare Workers, raw fetch is cleaner. Trade-off: we re-implement small bits (request shape, error translation) instead of getting them free. Worth it to avoid the SDK.
One bug class we hit at deploy time. The Workers fetch this-binding gotcha — fixed in commit 11. Documented as cautionary tale in the embeddings adapter. Same bug class as the one we hit in the Shopify client previously.