Skip to content

ADR-0030 — Long-term memory access pattern

The hardest design problem of Phase 1. How agents read and write persistent memory; how per-tenant scoping is enforced; how the embedding adapter interacts with Vectorize and D1.

By the time this ADR was written, ADR-0027 had committed long-term memory = D1 (source of truth) + Vectorize (vector index, joined by ID). ADR-0029 had committed the embedding adapter interface. Neither settled how agents use long-term memory at runtime.

This ADR settles six questions that were left open:

  1. Access pattern. Pre-turn retrieval vs. tool-based retrieval vs. both?
  2. Per-tenant scoping mechanism. Working memory was per-tenant via the per-job Durable Object. Long-term memory has no DO scoping it; the mechanism must be explicit.
  3. Embedding-on-write coupling. Synchronous on the write path or async via queue?
  4. Read-time embedding. Same question, mirror.
  5. Gateway shape. What does the LongTermMemoryGateway interface look like beyond the trivial store / search / delete?
  6. The contract from core/memory.ts. Does the existing LongTermMemory interface survive contact with the implementation?

Long-term memory is the single feature that makes agents genuinely “remember” in a useful way. Working memory is ephemeral — it disappears at turn end. Without long-term memory, every turn starts from zero context about what happened in past runs.

The order-triage scenario depends on this. The refund_decision agent calls recall_memory("refund history for sara@example.com") and gets back a list of past refunds with metadata. The decision the agent then makes — auto-approve, escalate, decline — is informed by what it found.

If the access pattern is wrong, the feature doesn’t work. If the tenant-scoping mechanism is wrong, one tenant’s data leaks to another’s queries. If the embedding-on-write coupling is wrong, write latency makes the feature unusable.

Access pattern: agent-driven via tools, not pre-turn retrieval

Section titled “Access pattern: agent-driven via tools, not pre-turn retrieval”

The runtime does not retrieve memories before the turn and stuff them into the prompt. Instead, the agent’s tool list includes recall_memory(query, top_k?) — the agent decides when to call it.

Why:

  • Selective. Most turns don’t need memory recall; pre-fetching for every turn wastes embedding cost and prompt space.
  • Explicit. When the agent decides to recall, it’s a visible tool call in the trace. Debugging is straightforward.
  • Composable with delegation. A sub-agent’s tool list includes its own recall_memory scoped to its own agent_id. No special pre-turn logic for sub-agents.

Per-tenant scoping: structural, via Vectorize metadata indexes

Section titled “Per-tenant scoping: structural, via Vectorize metadata indexes”

Vectorize supports metadata indexes; we declare two (tenant_id, agent_id) when we create the index. Every recall_memory call filters on both before similarity search runs. The runtime composes these filters from the context bundle’s tenant_id and the calling agent’s agent_idthe agent itself cannot specify them.

This is the security property: a malicious or buggy agent cannot retrieve another tenant’s memories. The filter is applied below the agent’s reach.

When the agent calls store_memory(content, metadata), the gateway:

  1. Embeds content synchronously via the embedding adapter
  2. Generates a ULID
  3. Inserts the row into D1 (content + metadata)
  4. Inserts the vector into Vectorize (just embedding + metadata for filtering)
  5. Returns the ULID

If any step fails, no entry exists. We considered async-via-queue and rejected it: the failure semantics are weaker (an entry exists in D1 but is unsearchable), and Phase 1’s write volume is low enough that synchronous is fine.

Same path, opposite direction:

  1. Embed the query
  2. Search Vectorize with the embedding + metadata filter
  3. Hydrate full content from D1 by ID
  4. Return the result

One embedding call per recall_memory invocation. At our scale, ~$0.0000004 per query — effectively free.

interface LongTermMemory {
store(input: StoreInput): Promise<StoreResult>;
search(query: SearchQuery): Promise<SearchResult>;
delete(id: MemoryId): Promise<void>;
}

Three methods. No searchByMetadata, no bulkStore, no getById. Add them when there’s a use case.

Phase 1’s order-triage demo exercises the full path:

  1. Operator hits /admin/seed-memory with the refund-history fixtures. The seed handler iterates 10 entries; for each one, it calls gateway.store() which embeds + writes to both stores.
  2. A customer email lands; triage delegates to refund_decision.
  3. refund_decision’s first tool call is recall_memory("refund history for <email>"). The gateway embeds the query, searches Vectorize with tenant_id=default, agent_id=agent-refund-decision filter, gets top-K matches, hydrates from D1, returns content + metadata.
  4. refund_decision reasons over the matches and decides.

The delay added to the agent turn is one embedding call (~50ms) plus one Vectorize query (~50ms) plus one D1 hydrate (~10ms) = ~110ms total. Negligible compared to the ~10-second LLM call that follows.

  • Embedding cost on every recall. OpenAI’s text-embedding-3-small is cheap; this is fine. Could cache embeddings for repeated queries but the hit rate is too low to be worth it.
  • No batch recall in v1. The agent calls recall_memory(query) once per turn typically. If a scenario emerges that needs multiple recalls in one turn, we’d add recall_memory_batch — additive, no breaking change.
  • Eventual consistency between D1 and Vectorize. A vanishingly rare race: write to D1 succeeds, write to Vectorize succeeds, but Vectorize’s metadata index hasn’t fully propagated yet. The next recall might miss the just-written entry. Acceptable; the entry is durable in D1 and discoverable on the next call.
  • Vectorize wipe-by-tenant doesn’t exist as of mid-2026. Wiping a tenant’s vectors requires recreating the entire index. Acceptable for v1 single-tenant; tracked as follow-up #11.
  • The platform’s “agents that remember” claim is real.
  • Adding a new agent that benefits from long-term memory is a YAML edit (memory_config.long_term_enabled: true); no code changes.
  • Adding a new tenant is a config edit; the structural scoping enforces isolation.
  • Replacing the embedding provider is a one-package change. The gateway doesn’t know which provider is in use.

For the original ADR with full Context / Decision / Consequences / Alternatives sections, see ADR-0030 source.

Related decisions:

  • ADR-0027 — storage primitives (Vectorize + D1 split)
  • ADR-0029 — embedding provider abstraction
  • ADR-0006 — six-layer context system (long-term memory is layer 6)