Skip to content

OpenAI

OpenAI is used for embeddings only — the vector representations that back semantic search in long-term memory. Agent reasoning runs on Anthropic; embeddings on OpenAI.

This split is deliberate. Different providers, different problems, different abstractions in the codebase.

One model: text-embedding-3-small. 1536 dimensions, cosine-friendly.

Whenever an agent calls recall_memory("..."), the runtime:

  1. Calls OpenAI’s embeddings endpoint with the query text
  2. Receives back a 1536-dim vector
  3. Searches Vectorize using that vector
  4. Hydrates the matches from D1
  5. Returns the result to the agent

Every long-term memory entry was also embedded at write time — the seed handler invokes the embedder for each entry it stores.

The decision was: which embedding provider?

OptionVerdict
OpenAI text-embedding-3-smallChosen. Strong quality at 1536 dims; cheap; production-tested; well-documented.
OpenAI text-embedding-3-largeHigher quality at 3072 dims, but doubles Vectorize cost (each stored vector is twice as large). Not worth it for Phase 1’s recall quality bar.
Anthropic embeddingsAnthropic doesn’t offer embeddings as a first-party API today.
Cohere embed-v3Strong alternative. Adds a second AI vendor for similar quality.
Self-hosted (sentence-transformers, BGE)Higher operational burden for small quality wins.

text-embedding-3-small won as the obvious cheap-and-good default. The platform’s EmbeddingAdapter interface (in packages/embeddings/) means swapping providers is a one-file change. ADR-0029 covers the abstraction; ADR-0030 covers the recall pattern.

OpenAI embeddings pricing:

ModelPer million tokens
text-embedding-3-small$0.02
text-embedding-3-large$0.13

Each Phase 1 seed entry is ~50-100 tokens; 10 entries × 100 tokens × $0.02/1M ≈ $0.00002 per full seed run. Each recall query is ~10-20 tokens × $0.02/1M ≈ $0.0000004 per query.

Embedding costs are essentially free at our scale.

A self-hosted embedding service (e.g. running sentence-transformers on a GPU box). For an embedding model trained on the public web, running our own would be silly — the API is cheaper than the electricity bill.

  • packages/embeddings/ — the provider-agnostic EmbeddingAdapter interface, plus a MockEmbeddingAdapter for tests
  • packages/embeddings-openai/ — the OpenAI implementation; uses raw fetch (no SDK dependency); translates errors into the @agent-platform/llm error taxonomy
  • apps/worker/src/gateways.ts — wires the OpenAI adapter into the long-term memory subsystem
  • Vendor concentration on embeddings. If OpenAI has an outage, recall fails. The EmbeddingAdapter abstraction means we can swap providers, but we don’t do that automatically.
  • Direct fetch, no SDK. The OpenAI SDK has its own quirks (Node-only assumptions, large dep graph). On Cloudflare Workers, raw fetch is cleaner. Trade-off: we re-implement small bits (request shape, error translation) instead of getting them free. Worth it to avoid the SDK.
  • One bug class we hit at deploy time. The Workers fetch this-binding gotcha — fixed in commit 11. Documented as cautionary tale in the embeddings adapter. Same bug class as the one we hit in the Shopify client previously.