ADR-0030 — Long-term memory access pattern

The hardest design problem of Phase 1. How agents read and write persistent memory; how per-tenant scoping is enforced; how the embedding adapter interacts with Vectorize and D1.

What this decision settles

By the time this ADR was written, ADR-0027 had committed long-term memory = D1 (source of truth) + Vectorize (vector index, joined by ID). ADR-0029 had committed the embedding adapter interface. Neither settled how agents use long-term memory at runtime.

This ADR settles six questions that were left open:

Access pattern. Pre-turn retrieval vs. tool-based retrieval vs. both?
Per-tenant scoping mechanism. Working memory was per-tenant via the per-job Durable Object. Long-term memory has no DO scoping it; the mechanism must be explicit.
Embedding-on-write coupling. Synchronous on the write path or async via queue?
Read-time embedding. Same question, mirror.
Gateway shape. What does the LongTermMemoryGateway interface look like beyond the trivial store / search / delete?
The contract from core/memory.ts. Does the existing LongTermMemory interface survive contact with the implementation?

Why this matters

Long-term memory is the single feature that makes agents genuinely “remember” in a useful way. Working memory is ephemeral — it disappears at turn end. Without long-term memory, every turn starts from zero context about what happened in past runs.

The order-triage scenario depends on this. The refund_decision agent calls recall_memory("refund history for sara@example.com") and gets back a list of past refunds with metadata. The decision the agent then makes — auto-approve, escalate, decline — is informed by what it found.

If the access pattern is wrong, the feature doesn’t work. If the tenant-scoping mechanism is wrong, one tenant’s data leaks to another’s queries. If the embedding-on-write coupling is wrong, write latency makes the feature unusable.

The decisions

Access pattern: agent-driven via tools, not pre-turn retrieval

The runtime does not retrieve memories before the turn and stuff them into the prompt. Instead, the agent’s tool list includes recall_memory(query, top_k?) — the agent decides when to call it.

Why:

Selective. Most turns don’t need memory recall; pre-fetching for every turn wastes embedding cost and prompt space.
Explicit. When the agent decides to recall, it’s a visible tool call in the trace. Debugging is straightforward.
Composable with delegation. A sub-agent’s tool list includes its own recall_memory scoped to its own agent_id. No special pre-turn logic for sub-agents.

Per-tenant scoping: structural, via Vectorize metadata indexes

Vectorize supports metadata indexes; we declare two (tenant_id, agent_id) when we create the index. Every recall_memory call filters on both before similarity search runs. The runtime composes these filters from the context bundle’s tenant_id and the calling agent’s agent_id — the agent itself cannot specify them.

This is the security property: a malicious or buggy agent cannot retrieve another tenant’s memories. The filter is applied below the agent’s reach.

Embedding-on-write: synchronous

When the agent calls store_memory(content, metadata), the gateway:

Embeds content synchronously via the embedding adapter
Generates a ULID
Inserts the row into D1 (content + metadata)
Inserts the vector into Vectorize (just embedding + metadata for filtering)
Returns the ULID

If any step fails, no entry exists. We considered async-via-queue and rejected it: the failure semantics are weaker (an entry exists in D1 but is unsearchable), and Phase 1’s write volume is low enough that synchronous is fine.

Read-time embedding: also synchronous

Same path, opposite direction:

Embed the query
Search Vectorize with the embedding + metadata filter
Hydrate full content from D1 by ID
Return the result

One embedding call per recall_memory invocation. At our scale, ~$0.0000004 per query — effectively free.

Gateway shape: narrow

interface LongTermMemory {
  store(input: StoreInput): Promise<StoreResult>;
  search(query: SearchQuery): Promise<SearchResult>;
  delete(id: MemoryId): Promise<void>;
}

Three methods. No searchByMetadata, no bulkStore, no getById. Add them when there’s a use case.

What this looks like in production

Phase 1’s order-triage demo exercises the full path:

Operator hits /admin/seed-memory with the refund-history fixtures. The seed handler iterates 10 entries; for each one, it calls gateway.store() which embeds + writes to both stores.
A customer email lands; triage delegates to refund_decision.
refund_decision’s first tool call is recall_memory("refund history for <email>"). The gateway embeds the query, searches Vectorize with tenant_id=default, agent_id=agent-refund-decision filter, gets top-K matches, hydrates from D1, returns content + metadata.
refund_decision reasons over the matches and decides.

The delay added to the agent turn is one embedding call (~50ms) plus one Vectorize query (~50ms) plus one D1 hydrate (~10ms) = ~110ms total. Negligible compared to the ~10-second LLM call that follows.

Trade-offs we accepted

Embedding cost on every recall. OpenAI’s text-embedding-3-small is cheap; this is fine. Could cache embeddings for repeated queries but the hit rate is too low to be worth it.
No batch recall in v1. The agent calls recall_memory(query) once per turn typically. If a scenario emerges that needs multiple recalls in one turn, we’d add recall_memory_batch — additive, no breaking change.
Eventual consistency between D1 and Vectorize. A vanishingly rare race: write to D1 succeeds, write to Vectorize succeeds, but Vectorize’s metadata index hasn’t fully propagated yet. The next recall might miss the just-written entry. Acceptable; the entry is durable in D1 and discoverable on the next call.
Vectorize wipe-by-tenant doesn’t exist as of mid-2026. Wiping a tenant’s vectors requires recreating the entire index. Acceptable for v1 single-tenant; tracked as follow-up #11.

What this enables

The platform’s “agents that remember” claim is real.
Adding a new agent that benefits from long-term memory is a YAML edit (memory_config.long_term_enabled: true); no code changes.
Adding a new tenant is a config edit; the structural scoping enforces isolation.
Replacing the embedding provider is a one-package change. The gateway doesn’t know which provider is in use.

Where to next

For the original ADR with full Context / Decision / Consequences / Alternatives sections, see ADR-0030 source.

Related decisions:

ADR-0027 — storage primitives (Vectorize + D1 split)
ADR-0029 — embedding provider abstraction
ADR-0006 — six-layer context system (long-term memory is layer 6)