ADR-0030 — Long-term memory access pattern
The hardest design problem of Phase 1. How agents read and write persistent memory; how per-tenant scoping is enforced; how the embedding adapter interacts with Vectorize and D1.
What this decision settles
Section titled “What this decision settles”By the time this ADR was written, ADR-0027 had committed long-term memory = D1 (source of truth) + Vectorize (vector index, joined by ID). ADR-0029 had committed the embedding adapter interface. Neither settled how agents use long-term memory at runtime.
This ADR settles six questions that were left open:
- Access pattern. Pre-turn retrieval vs. tool-based retrieval vs. both?
- Per-tenant scoping mechanism. Working memory was per-tenant via the per-job Durable Object. Long-term memory has no DO scoping it; the mechanism must be explicit.
- Embedding-on-write coupling. Synchronous on the write path or async via queue?
- Read-time embedding. Same question, mirror.
- Gateway shape. What does the
LongTermMemoryGatewayinterface look like beyond the trivialstore / search / delete? - The contract from
core/memory.ts. Does the existingLongTermMemoryinterface survive contact with the implementation?
Why this matters
Section titled “Why this matters”Long-term memory is the single feature that makes agents genuinely “remember” in a useful way. Working memory is ephemeral — it disappears at turn end. Without long-term memory, every turn starts from zero context about what happened in past runs.
The order-triage scenario depends on this. The
refund_decision agent
calls recall_memory("refund history for sara@example.com") and
gets back a list of past refunds with metadata. The decision the
agent then makes — auto-approve, escalate, decline — is informed
by what it found.
If the access pattern is wrong, the feature doesn’t work. If the tenant-scoping mechanism is wrong, one tenant’s data leaks to another’s queries. If the embedding-on-write coupling is wrong, write latency makes the feature unusable.
The decisions
Section titled “The decisions”Access pattern: agent-driven via tools, not pre-turn retrieval
Section titled “Access pattern: agent-driven via tools, not pre-turn retrieval”The runtime does not retrieve memories before the turn and
stuff them into the prompt. Instead, the agent’s tool list
includes recall_memory(query, top_k?) — the agent decides when
to call it.
Why:
- Selective. Most turns don’t need memory recall; pre-fetching for every turn wastes embedding cost and prompt space.
- Explicit. When the agent decides to recall, it’s a visible tool call in the trace. Debugging is straightforward.
- Composable with delegation. A sub-agent’s tool list
includes its own
recall_memoryscoped to its own agent_id. No special pre-turn logic for sub-agents.
Per-tenant scoping: structural, via Vectorize metadata indexes
Section titled “Per-tenant scoping: structural, via Vectorize metadata indexes”Vectorize supports metadata indexes; we declare two
(tenant_id, agent_id) when we create the index. Every
recall_memory call filters on both before similarity
search runs. The runtime composes these filters from the
context bundle’s tenant_id and the calling agent’s agent_id
— the agent itself cannot specify them.
This is the security property: a malicious or buggy agent cannot retrieve another tenant’s memories. The filter is applied below the agent’s reach.
Embedding-on-write: synchronous
Section titled “Embedding-on-write: synchronous”When the agent calls store_memory(content, metadata), the
gateway:
- Embeds
contentsynchronously via the embedding adapter - Generates a ULID
- Inserts the row into D1 (content + metadata)
- Inserts the vector into Vectorize (just embedding + metadata for filtering)
- Returns the ULID
If any step fails, no entry exists. We considered async-via-queue and rejected it: the failure semantics are weaker (an entry exists in D1 but is unsearchable), and Phase 1’s write volume is low enough that synchronous is fine.
Read-time embedding: also synchronous
Section titled “Read-time embedding: also synchronous”Same path, opposite direction:
- Embed the query
- Search Vectorize with the embedding + metadata filter
- Hydrate full content from D1 by ID
- Return the result
One embedding call per recall_memory invocation. At our
scale, ~$0.0000004 per query — effectively free.
Gateway shape: narrow
Section titled “Gateway shape: narrow”interface LongTermMemory { store(input: StoreInput): Promise<StoreResult>; search(query: SearchQuery): Promise<SearchResult>; delete(id: MemoryId): Promise<void>;}Three methods. No searchByMetadata, no bulkStore, no
getById. Add them when there’s a use case.
What this looks like in production
Section titled “What this looks like in production”Phase 1’s order-triage demo exercises the full path:
- Operator hits
/admin/seed-memorywith the refund-history fixtures. The seed handler iterates 10 entries; for each one, it callsgateway.store()which embeds + writes to both stores. - A customer email lands; triage delegates to refund_decision.
- refund_decision’s first tool call is
recall_memory("refund history for <email>"). The gateway embeds the query, searches Vectorize withtenant_id=default, agent_id=agent-refund-decisionfilter, gets top-K matches, hydrates from D1, returns content + metadata. - refund_decision reasons over the matches and decides.
The delay added to the agent turn is one embedding call (~50ms) plus one Vectorize query (~50ms) plus one D1 hydrate (~10ms) = ~110ms total. Negligible compared to the ~10-second LLM call that follows.
Trade-offs we accepted
Section titled “Trade-offs we accepted”- Embedding cost on every recall. OpenAI’s
text-embedding-3-smallis cheap; this is fine. Could cache embeddings for repeated queries but the hit rate is too low to be worth it. - No batch recall in v1. The agent calls
recall_memory(query)once per turn typically. If a scenario emerges that needs multiple recalls in one turn, we’d addrecall_memory_batch— additive, no breaking change. - Eventual consistency between D1 and Vectorize. A vanishingly rare race: write to D1 succeeds, write to Vectorize succeeds, but Vectorize’s metadata index hasn’t fully propagated yet. The next recall might miss the just-written entry. Acceptable; the entry is durable in D1 and discoverable on the next call.
- Vectorize wipe-by-tenant doesn’t exist as of mid-2026. Wiping a tenant’s vectors requires recreating the entire index. Acceptable for v1 single-tenant; tracked as follow-up #11.
What this enables
Section titled “What this enables”- The platform’s “agents that remember” claim is real.
- Adding a new agent that benefits from long-term memory is a
YAML edit (
memory_config.long_term_enabled: true); no code changes. - Adding a new tenant is a config edit; the structural scoping enforces isolation.
- Replacing the embedding provider is a one-package change. The gateway doesn’t know which provider is in use.
Where to next
Section titled “Where to next”For the original ADR with full Context / Decision / Consequences / Alternatives sections, see ADR-0030 source.
Related decisions: