ADR-0027: Storage primitives per memory layer and data type

Status: Accepted Date: 2026-04-28 Supersedes: Implicit assumptions from the v1.0.0 design doc (no prior ADR formalized this mapping)

Context

The platform has five Cloudflare storage primitives bound and available, plus one observability sink:

D1 — SQLite, strong consistency, regional, SQL with joins. Cost scales per-query. Good for structured records, bad for high-write-throughput or large blobs.
KV — eventually consistent (up to ~60s globally), reads cached at the edge, writes propagate with lag. Good for read-heavy data tolerant of staleness, bad for read-after-write semantics.
Durable Object Storage — strongly consistent within the DO, transactional, scoped to a single addressable instance. Per-key limit ~128KB, total per-DO in the GB range. Good for in-flight session state, bad for cross-instance shared data.
Vectorize — vector index only. Stores (id, vector, metadata) triples; the actual text/document must live elsewhere and be joined by ID.
R2 — object storage, S3-compatible, no egress fees. Good for large blobs, bad for many small frequent reads.
Workers Analytics Engine — high-cardinality time-series, automatic sampling at high write rates, cheap at scale, queryable via Cloudflare’s GraphQL Analytics API. Not “storage” in the durable-record sense, but earns a place here because it is the home for aggregate observability data.

ADR-0014 commits the runtime to Cloudflare Workers. ADR-0024 commits Durable Objects for async job records. The rest of the storage question — which primitive holds working memory, long-term memory, shared context, durable history, traces — has been decided ad-hoc per component. The next two pieces of work (working-memory subsystem, long-term-memory subsystem) need this settled before they can be built coherently.

A meta-question worth surfacing: should the platform stay on Cloudflare-native primitives for all of these? Vectorize is younger than Pinecone or Turbopuffer; D1 has had reliability noise; KV’s eventual consistency is sometimes wrong-shaped. The answer for v1 is yes — the runtime is already at the edge, switching to off-Cloudflare primitives introduces cross-network latency that would dominate any quality difference, and none of the primitives is currently failing us. Postgres via Hyperdrive remains the named escape hatch (see “Explicitly deferred” below).

Decision

Per memory layer (the 6-layer context model)

Layer	Primitive	Notes
1. Core Context (immutable)	Bundled in code	Not “storage” — part of agent definition, loaded at instantiation
2. Characteristics (immutable)	Bundled in code	Same as Core
3. Shared Context (read-only)	D1 as source of truth	KV read-through cache deferred — interface designed to allow this without breaking changes
4. Delegated Context (per-task)	DO Storage	Lives inside the AGENT_JOB DO that owns the task; ephemeral with the run
5. Working Memory (sliding window)	DO Storage	Same DO as the run owns its working memory; strong consistency, naturally scoped
6. Long-term Memory (persistent)	Vectorize + D1	Vectorize holds embeddings, D1 holds the text/metadata; retrieval joins on ID

Per data type (everything else)

Data	Primitive	Notes
Job records (durable history)	D1	Already in place per ADR-0024; tenant-scoped rows
Job index (list-by-tenant, list-by-status)	KV	Already in place; eventual consistency tolerable for listings
Per-run audit trail (every tool call in a specific run)	D1	Joined to the job row; supports exact replay
Aggregate traces (latency p50/p99 by tool, error rates, cost per tenant)	Workers Analytics Engine	Sampled at high write rates; right shape for dashboards
Tenant / business config	D1	Low volume, relational, queryable
Large artifacts (reports, exports, generated images)	R2	Anything over ~100KB or binary
Agent definitions (when wizard exists)	D1	Deferred to wizard work; flagged here for completeness

Cross-cutting rules

D1 is the source of truth for any data the platform cannot regenerate. KV holds caches and indexes derivable from D1. If KV diverges, D1 wins.
Vectorize never holds the only copy of anything. Every vector has a D1 row. If the Vectorize index is lost, re-embed from D1.
Durable Object storage is for state that has a clear owner and lifecycle — a job, a session, a conversation. Anything that needs cross-agent or cross-tenant access goes to D1.
Tenant scoping is enforced at the primitive level from day one. D1 schemas have a tenant_id column on every multi-tenant table. KV keys are prefixed t:<tenant_id>:.... DO IDs are derived from (tenant_id, job_id). Vectorize uses metadata filters by tenant_id, with namespace-per-tenant as a future-proofing option if metadata filtering proves insufficient at scale.
R2 is only for blobs. Default to D1 for small structured data even if it feels file-like; switch to R2 only when size or binary-ness demands it.
Observability is split by access pattern. Per-run audit (replay, debug a specific job) goes to D1. Aggregate observability (dashboards, alerts) goes to Workers Analytics Engine. Collapsing both into one sink was considered and rejected — the access patterns are genuinely different, and Analytics Engine’s automatic sampling is wrong for per-run replay.

Considered and rejected

KV for working memory. open-questions.md had named KV as a candidate for “simple key-value working memory.” Rejected because the run already has an addressable DO with strong consistency and a natural lifecycle, and KV’s eventual consistency is wrong for read-after-write within a single agent turn. Working memory in DO storage compacts naturally because the DO’s lifecycle bounds the data.
Single observability sink. Putting all traces in D1 is simple but does not scale; putting all traces in Analytics Engine is cheap but loses per-run replay. The two-sink split is the cost of getting both properties.

Consequences

Becomes easy:

Working memory has a clear home (the DO that owns the run), with strong consistency and no extra moving parts.
Long-term memory has a clear retrieval pattern: vector search → IDs → D1 lookup. The text is always queryable by other means too (timestamp, tag, agent), not just semantic search.
Adding caching later is a clean retrofit — D1 stays authoritative, KV layers in front for Layer 3 (Shared Context) when measured demand justifies it.
Aggregate dashboards (cost per tenant, p99 latency by tool, error rate trends) are cheap to build on Analytics Engine without polluting D1 with high-volume rows.

Becomes hard / accepted tradeoffs:

DO storage 128KB/key limit. Working memory windows must fit, which means a compaction strategy is needed when conversations grow long. Standard answer: bound the sliding window by token count, summarize evicted turns into long-term memory. This means working memory is not “everything that ever happened in this run” — it is “the last N tokens, plus pointers to summaries.”
No hybrid (semantic + keyword) search out of the box. Vectorize is pure vector. Keyword fallback, if needed, is D1 LIKE/FTS. Acceptable for v1.
D1 write throughput. Per-run audit trail writes go to D1. If volume becomes a problem at scale, that specific stream can move to R2 or Analytics Engine, but D1 stays the source of truth for job records themselves.
Vectorize was chosen on platform-locality grounds, not benchmarked. pgvector via Hyperdrive remains the alternative if filter expressivity or recall quality become limiting. Not benchmarked at this stage; revisit if the long-term memory subsystem hits ceilings.
Two observability sinks to maintain. A small traceWriter wrapper will centralize the schema and write paths; the cost is one more package boundary to keep tidy.

Explicitly deferred (do not decide here):

Embedding model and provider — belongs in the LLM abstraction ADR (next).
Read-through cache for Shared Context (D1 + KV) — interface designed to permit it; not built. Add when measured.
Backup / export / point-in-time recovery — operational concern, post-Phase-1.
Hybrid search — defer until a use case demands keyword + semantic.
Cross-region data residency — relevant for EU customers (Ganimarka and the planned vetzoo onboarding are both Swedish). D1 has location hints; addressed when a customer has a hard residency requirement.
Postgres via Hyperdrive as escape hatch. Named explicitly: if D1 hits scale or relational expressivity limits, or if Vectorize’s filter or recall quality becomes limiting, Hyperdrive + Postgres + pgvector is the bounded path off Cloudflare-native primitives. Not adopted speculatively; ADR amendment when triggered.

Trigger conditions for revisit

Working memory exceeds 128KB/key for a real workload before compaction lands → working memory needs a different home or compaction becomes non-optional.
D1 query cost or latency dominates a hot path → measure, then either index more carefully, add KV cache, or escalate to Hyperdrive.
Vectorize recall or filter expressivity blocks a real long-term-memory use case → benchmark pgvector via Hyperdrive.
Analytics Engine sampling makes a needed dashboard inaccurate → that specific metric moves to D1 or R2-backed logs.
A customer has a hard data-residency requirement that cannot be met on Cloudflare’s current footprint → bounded Hyperdrive escape.

Implementation notes (for follow-up commits, not this ADR)

AGENT_TRACES Analytics Engine binding to be added to wrangler.toml when the trace writer ships.
Trace schema wrapper to live in packages/logger or a new packages/observability — boundary decision deferred to the implementation PR.