Skip to content

ADR-0013: Enterprise readiness bar

Status: Accepted Date: 2026-04-21

The platform is being built to be sold to e-commerce businesses as an autonomous operations layer. That means every component eventually sits in front of real customers, real money, and real regulated data (PII at minimum, payment context likely). The project’s owner has explicitly stated the preference: build for enterprise use from day one rather than ship a proof-of-concept and retrofit quality later.

“Enterprise-ready” is a phrase that is easy to agree to and hard to enforce. Without a written bar, reviewers fall back on personal taste, new contributors guess at what “good enough” means, and quality decays one small exception at a time.

This ADR is deliberately the first ADR of this kind — not an architecture decision in the traditional sense, but a meta-decision about what acceptance criteria apply to every other ADR and every PR. It exists so that later ADRs can reference it instead of re-justifying the same principles every time.

Scope: this applies to Platform Core (packages/*) and to first-party apps in apps/. Business Packs (future vertical packages) are expected to meet the same bar but may interpret “audit record” or “budget enforcement” in vertical-specific ways.

Every component shipped from this point forward must meet the following bars. “Meet” means the bar is demonstrable in code or CI — not claimed in a README. A PR that does not meet every applicable bar is not merged.

  1. Every trust boundary has a schema and tests for both valid and malicious input. “Trust boundary” = any point where data crosses from a less-trusted zone into a more-trusted one: wire → runtime (done for delegated contexts), file → runtime (applies to agent YAML and config), LLM output → runtime (applies to every model response), tool output → runtime (applies to every tool call), stored memory → runtime (applies to every retrieval). Validation is by a Zod schema in @agent-platform/schemas. Tests cover (a) well-formed input accepted, (b) structurally invalid input rejected, and (c) at least one realistic malicious input rejected.
  2. Every externally-loaded artifact has a threat model entry. The table in docs/architecture.md#threat-model is the single source. A component that introduces a new externally-loaded artifact (YAML file, stored memory, tool response, LLM response) adds one row per distinct threat and either shows the control in place or names it a gap with an explicit trigger to revisit — following the ADR-0012 pattern.
  3. Secrets never touch code, logs, or stored context. API keys, tokens, and credentials are loaded from environment or platform bindings only. Error messages are redacted before they reach any log, any event, any stored memory, or any LLM prompt. Every new component that touches a secret gets a test asserting the secret is absent from its error paths.
  4. Every decision involving data sourced from outside the signed repository gets its own ADR. The ADR-0012 trigger (“first time an agent definition is loaded from outside the repo”) is the pattern, not the exception. Applies to YAML loaded from user uploads, tool responses, any future plugin surface.
  1. Every LLM call is fully traceable. Structured record includes: agent id, task id, model identifier, input token count, output token count, latency ms, cost usd, outcome (completed / refused / error), and a content-addressed hash of the assembled context bundle. This is persisted before the response is returned to the caller — not batched, not fire-and-forget. The record’s schema is an ADR once the LLM adapter ships.
  2. Every agent turn produces an auditable record. Turn = one assemble → LLM → respond cycle. The record links to (a) the context bundle hash, (b) every LLM call made during the turn, (c) every tool call made during the turn, (d) the final outcome. Sufficient to answer “why did the agent say X at time T” six months later without replaying the turn.
  3. Logs are structured, not grep-ready strings. console.log is not acceptable in platform code. Every log entry is JSON with at minimum { level, timestamp, agent_id?, task_id?, component, event, ...payload }. The logger is injected, not imported globally, so tests can assert log content.
  4. Errors are typed and structured. Every thrown error is an instance of a named class that extends Error. Every named class carries structured fields relevant to its failure mode (the ContextAssemblyError.issues field is the pattern). Every error class has a documented log level and a documented user-facing message policy.
  1. Every component has explicit, tested failure modes. For each externally-facing operation: what happens on timeout, on invalid response, on quota exhaustion, on permission denial, on network failure. Each failure mode has at least one test that exercises it and asserts the right Error subclass is thrown with the right structured data.
  2. Timeouts and budgets are enforced, not received. TaskConstraints.time_budget_ms and TaskConstraints.cost_budget_usd already exist in the type system. Every component that accepts them must actually enforce them at runtime — not log them, not decorate with them, enforce. A component that cannot enforce a budget must reject the constraint at intake with a typed error.
  3. Graceful degradation for non-essential subsystems. Long-term memory failing, shared context being empty, or a non-essential tool being unavailable must not crash the agent turn. A warning log and a continued turn beats a crashed turn. The definition of “non-essential” is per-component and documented in that component’s README.
  4. No feature that blocks on an open question ships. If a component depends on a decision that is still in docs/open-questions.md, that question graduates to an ADR before the component ships. This prevents components from encoding an implicit answer that has never been agreed to.
  1. CI enforces the quality gate on every PR. pnpm check (lint + typecheck + tests) must be green before merge. Branch protection on main. This is the first enforcement mechanism; ADR-0015 (TBD) commits to the specific CI surface.
  2. Every public API is documented. Package-level README describes what the package does and doesn’t do. Every exported symbol has TSDoc covering what it does, what it throws, and any non-obvious invariants.
  3. Untested code paths are not merged. “Public API + happy path + at least one failure mode” is the floor, not the ceiling. For security-critical code the floor is higher (the context assembler has 15 tests for one function; that ratio is a reasonable guide).
  4. No any, no @ts-ignore, no as unknown as T without a comment. The codebase uses exactOptionalPropertyTypes and every strict flag. When an escape hatch is genuinely needed (see the as DelegatedContext cast in the assembler), it is documented at the site and recorded in open-questions if it represents a real gap.

The following were considered and explicitly left out of Phase 1. They are real enterprise concerns but do not belong in the foundation. Each is listed to prevent well-meaning contributors from adding them “to be thorough”:

  • Multi-tenancy, RBAC, SSO. No users yet. Introducing these abstractions before there’s a user model is speculation.
  • On-premise / air-gapped deployment. No customer has asked for it. Architectural choices (Cloudflare-first) actively work against it. When a customer asks, it becomes a product decision, not an architecture decision.
  • Formal certifications (SOC 2, ISO 27001, HIPAA). Certifications validate processes. Bar items 1-16 are the processes. Certification comes when there is a product to certify.
  • Penetration tests, red teams, bug bounties. Belong to a product that exists and has users. Premature today.
  • High availability, multi-region, disaster recovery. Cloudflare’s default posture covers the default case; anything beyond is a product-lifecycle decision.
  • Admin UI, dashboards, compliance reports. Phase 4. The data is produced by the observability bar items above; the UI comes later.
  • Every subsequent ADR is measured against this bar. When a later ADR says “accepts YAML from the filesystem,” a reviewer can point to bar items 1, 2, 9, 14 and ask “where’s the schema, threat model entry, failure-mode test, TSDoc?” before the ADR is accepted.
  • The first few components will feel slow. A component that would be 200 lines of code to “make it work” is 600 lines by the time logging, structured errors, failure-mode tests, and audit records are in place. This is the explicit tradeoff the project is making.
  • Retrofit cost is paid upfront, not later. Adding structured logging to 20 components after the fact is an order of magnitude more expensive than building each component with it from day one. Same for audit records, timeout enforcement, and secret redaction.
  • Contributors have a reference. New ADRs and PRs can cite “meets bar 3” or “see bar 10” rather than re-arguing principles. The bar is the contract.
  • Some current code is behind the bar. The context assembler (shipped in ADR-0012) meets bars 1, 2, 4, 8, 14, 15, 16 but does not yet produce audit records (bar 6) or use a structured logger (bar 7) — because no audit sink and no logger exist yet. These are not violations today; they are scheduled to be added when the audit-record ADR and the logger ADR land. The bar is forward-looking, not retroactive-punishing.
  • This ADR is living. When the bar is wrong (too loose, too strict, missing a concern), it is amended with a new ADR that supersedes this one. Discovering that a bar item doesn’t survive contact with reality is a reason to revise, not to quietly ignore.
  • No written bar; rely on reviewer judgment. Works while there is one reviewer. Fails the moment there are two who disagree, or one who gets tired of arguing the same principle on every PR. Also fails silently: the bar drops without anyone noticing.
  • A shorter bar (“just: security, observability, tests”). Tempting, but the phrase “enterprise-ready” is broad enough that a short list gets reinterpreted as “anything I think is enterprise.” The specific 16-item bar is chosen because every item maps to a concrete thing reviewers can point at.
  • Adopt an external standard (OWASP ASVS, CIS benchmarks, SOC 2 control mappings) verbatim. Those are products of their own contexts — designed for web apps, for infrastructure, for compliance audits. Copy-pasting them produces a bar that doesn’t fit this codebase and is ignored. The items above are drawn from those bodies of practice, translated to what this specific project is.
  • Different bars for “core” vs “business packs.” Considered and rejected for Phase 1. Business Packs are the most security-critical code in the system — they handle customer data. Giving them a lower bar than platform code is exactly backwards.
  • Accept the bar but defer enforcement. Bar without enforcement is aspiration. ADR-0015 (below, TBD) commits to the specific CI mechanism that turns bar item 13 into a merge gate.