ADR-0016: Structured logging

Status: Accepted Date: 2026-04-21

Context

ADR-0013 bar item 7 requires: “Logs are structured, not grep-ready strings. console.log is not acceptable in platform code. Every log entry is JSON with at minimum { level, timestamp, agent_id?, task_id?, component, event, ...payload }. The logger is injected, not imported globally, so tests can assert log content.”

ADR-0014 commits the platform to Cloudflare Workers. Workers Logs is the log sink by default; it accepts structured entries via console.log(JSON.stringify(...)) and routes by console method (e.g. console.error routes to the error level in the dashboard).

Two questions follow: where does the logging implementation come from (pino, another library, or build our own?), and what’s the interface consumers actually depend on?

Decision

Build our own minimal logger. Ship it as @agent-platform/logger.

Interface first. Every component takes a Logger interface, never a concrete class, never a module singleton. Consumers accept the interface; composition roots construct the concrete implementation; tests substitute MemoryLogger.
Two concrete implementations. JsonLogger writes one JSON entry per call to a sink (default: console.* routed by level). MemoryLogger accumulates entries in an entries array for test assertions.
Entry shape is fixed by the interface. Every emitted entry has timestamp, level, event (snake_case), and component, plus optional agent_id / task_id (from child() bindings) and any additional payload the caller provides.
Redaction is key-based, case-insensitive, and opt-out by default. DEFAULT_REDACTED_KEYS covers authorization, cookie, password, secret, token, api_key, apikey, session. Callers may override with a custom list per logger instance.
Synchronous emission. console.log(JSON.stringify(entry)) on both Node and Workers. No transports, no queues, no async flush. The sink abstraction lets us add batching locally to one package if throughput ever demands it.

Implementation lives in packages/logger/. The factory createLogger({ component }) returns a Logger. Concrete classes exist but should not be imported at consumer sites.

Consequences

ADR-0013 bar 7 is mechanically enforceable. A reviewer can search the codebase for console.log in non-logger packages and flag every hit as a bar violation. The rule is “is it in a non-logger package?” — unambiguous.
Tests assert on log content directly. MemoryLogger.entries is an array of LogEntry. A test can assert expect(log.entries[0]).toMatchObject({ level: 'info', event: 'context_assembled' }) without parsing strings or monkey-patching global console.
Logger is a small surface. Six level methods, logError, child, and createLogger — that’s the whole exported API (plus types). Small enough to read in one pass; small enough that a contributor can understand it end-to-end before using it.
Zero production dependencies. Nothing to audit beyond this package. No “pino broke on Workers runtime X” risk. If the package is ever deleted, consumers break at compile time because the Logger import fails, not silently at runtime.
child() is the correlation-id primitive. At agent-turn boundaries the runtime calls logger.child({ agent_id, task_id }) once; every entry in the turn carries both IDs. Memory-wise this is cheap because child loggers share bindings by reference.
Level filtering is free. JsonLogger defaults to info (production); trace/debug entries are dropped without formatting. MemoryLogger defaults to trace because tests want to see everything.
logError encodes the error-taxonomy contract. Passing an AgentPlatformError instance (from ADR-0017) emits it at the error’s declared severity, with toJSON() producing the payload. Passing a native Error falls back to error level with name + message only. Passing a non-Error coerces via String(). Callers never have to remember to do JSON.stringify(err) — and err.stack is never in the output.
No Workers-specific code in the implementation. JsonLogger’s sink is console.*. Workers Logs picks up JSON output automatically. When @cloudflare/vitest-pool-workers becomes available (ADR-0014’s deferred testing decision), this logger will continue to work unchanged — it isn’t Workers-specific, it’s Workers-compatible.
Audit records are a separate concern, tracked separately. Bar 6 of ADR-0013 requires auditable per-turn records for compliance replay. That is a different data shape and a different persistence story. The audit-record ADR (not yet written) may use this logger as one of its sinks, but the two components are not merged. Doing so would couple unrelated lifecycles.

Consequences for the repo

New workspace package: packages/logger/. Depends on @agent-platform/errors. No runtime dependencies.
Exported types: Logger, LogEntry, LogLevel, LoggerOptions.
Exported values: createLogger, JsonLogger, MemoryLogger, redact, DEFAULT_REDACTED_KEYS, REDACTED_PLACEHOLDER, LOG_LEVEL_ORDER, errorSeverityToLogLevel.
Existing code (context assembler, schemas) does not emit logs yet and is not required to change. Components that do emit logs, from this point forward, take a Logger parameter and route through it.

Alternatives considered

Pino. The Node ecosystem default. Well-tested, fast, extensive plugin ecosystem. Rejected because most of what makes pino valuable on Node (async transports, worker threads for serialisation, pretty-printing transforms) is either unavailable on Workers or an outright hazard there. Using pino on Workers in practice means disabling its interesting features and leaning on it as a JSON formatter with redaction — at which point the 30 KB of pino code plus a transitive-dep graph is paying nothing. A 100-line JSON formatter with redaction is simpler to audit and maintain.
Winston. Heavier than pino with more indirection (transports, formats as separate abstractions). Same rejection reason as pino, more strongly.
Vercel AI SDK logger / LogTape / Workers-native libraries. Each introduces a dependency we do not need to introduce. Our log needs are small; the interface is small; the implementation is small. Using a library to save ~150 lines is not a good trade for a non-trivial audit surface.
Module-global singleton (getLogger()). A familiar pattern from Java/Python. Rejected because tests then have to monkey-patch the module, bindings can’t be set per-turn without threading a context variable through every call, and the “inject a dependency” pattern is already in place for everything else in the platform — a singleton logger would be the one exception.
Single log(level, event, payload) method, level constants as sugar. Considered. The leveled methods (logger.info, logger.warn) read more naturally at call sites and the API cost is small (six methods instead of one). Readability wins.
Include stack traces in logError output by default. Rejected. Stack traces can contain file paths that leak infrastructure (build-server layout, bundler intermediate files). ADR-0013 bar 3 treats this as a redaction concern. The toJSONWithStack() method on AgentPlatformError exists for explicit diagnostic use; it is not the default path.
Value-based redaction (regex for JWTs, API-key formats, etc.). Rejected. Value-based redaction produces false positives that silently corrupt legitimate data. Key-based redaction puts the decision at the code level where it belongs, with a documented default list that catches the obvious suspects.