Order Triage

The platform’s most-exercised scenario. A customer email lands; within a single agent run, it gets classified, a refund decision gets made (informed by the customer’s history), and a customer-facing reply gets drafted — all without a human in the loop.

This is the scenario that makes the platform’s distinctive features concrete: multi-agent delegation, long-term memory, event-driven side effects, and YAML-defined agents.

The setup

A small e-commerce store gets dozens of customer emails per day. Most are mundane: shipping questions, product questions, the occasional complaint. A meaningful slice are refund requests. Triaging them costs time; deciding on them costs judgment (small refunds are obvious, larger or repeat-customer cases are not); and replying to customers in the right tone costs care.

The order-triage scenario automates the first two of these and drafts the third for human review. Three agents collaborate:

triage — the front door. Classifies every incoming email; routes refund cases to the decision sub-agent; forwards everything else to a human queue.
refund_decision — the policy authority. Recalls past cases for this customer, applies refund policy, decides: auto-approve, escalate, or deny.
communication — the customer-facing voice. Drafts the reply email matching the decision, in the right tone, under 120 words.

The flow

A typical refund case, end to end:

Triage receives the request:

HTTP client → triage: POST /run with { instructions: "Anna says her towel arrived damaged. She wants a refund.", payload: { email: "anna@example.com" } }
triage runs LLM turn 1, classifies the email as “refund or return request”
triage runs LLM turn 2, calls shopify_get_order_by_email for anna@example.com — finds order #1042, total $49 SEK, ordered 12 days ago
triage runs LLM turn 3, decides to delegate. Calls delegate_to_refund_decision with the order details and the customer’s reasoning

The runtime synthesizes the delegation tool from the sub-agent list and invokes runTurn() recursively:

triage → refund_decision: sub-agent gets a fresh six-layer context — its own system prompt (the policy authority), its own characteristics (careful, evidence-citing), its own tool list
refund_decision → long-term memory: recall_memory("refund history for anna@example.com") returns 1 prior match: a successful $32 refund 4 months ago for a different item
refund_decision runs LLM turn 1, reasons:
- Order total $49 SEK, under the $50 auto-approve threshold ✓
- One prior refund 4 months ago, not a pattern ✓
- Reason given (damaged on arrival), on the auto-approve list ✓
- Order is 12 days old, well within the 30-day window ✓
- Decision: auto_approve
refund_decision → shopify-actions queue: emit_event with { action: "refund", order_id: "#1042", amount: 49, reason: "damaged on arrival" }
refund_decision → long-term memory: store_memory("Approved $49 refund for order #1042, anna@example.com, reason: damaged on arrival, no prior pattern")
refund_decision delegates to communication

The communication sub-agent drafts the reply:

refund_decision → communication: sub-agent gets the decision (auto_approve) and the context (order number, amount, reason)
communication runs LLM turn 1, picks the apologetic-warm-action-oriented tone block, drafts a 47-word reply confirming the refund and setting expectations for the 3-5 business day return

The chain unwinds:

communication → refund_decision: returns the email body
refund_decision → triage: AgentReport { decision: "auto_approve", summary, email_draft }
triage → HTTP client: 200 OK with the consolidated report

Side effects: the shopify-actions queue now has one event ready for a human-or-machine consumer to act on. In Phase 1 the consumer logs the event and stops; in Phase 2 it’ll execute the actual Shopify mutation behind a human approval gate.

What this demonstrates

Platform feature	How this scenario uses it
YAML agent definitions (ADR-0031)	All three agents are defined as YAML in `apps/worker/agents/`; prompts are separate Markdown files
Six-layer context (ADR-0006)	Each agent gets fresh layers 1-2 (its identity); layer 4 carries the delegated task; layers 5-6 carry runtime state and recalled memories
Delegation as tool (ADR-0022)	`delegate_to_refund_decision` and `delegate_to_communication` are synthesized at runtime from the sub-agent lists
Long-term memory (ADR-0030)	`refund_decision` calls `recall_memory` (read) and `store_memory` (write); memories scoped per-agent and per-tenant
Event emission (ADR-0032)	`emit_event` puts side-effect requests on the `shopify-actions` queue; consumers handle them async
Custom Shopify tool	`shopify_get_order_by_email` is a hand-registered tool in the Worker’s tool registry
Hard constraints in prompts	Each agent’s YAML lists its `hard_constraints`; the runtime composes them into the system prompt; the LLM enforces them
Model tiering (ADR-0008)	`triage` and `refund_decision` use `model_tier: main` (Sonnet); `communication` uses `model_tier: sub` (Haiku — faster and cheaper for short drafting)

The artifacts

The actual YAML and Markdown files that define this scenario. These are bundled into the deployed Worker (ADR-0033); they ship as part of the codebase, not as runtime config.

`apps/worker/agents/triage.yaml`

apiVersion: agent-platform/v1
kind: Agent

metadata:
  id: agent-triage
  name: triage
  version: 0.1.0
  role: main
  tags: [crm, scenario-b]

model_tier: main

core_context:
  system_prompt:
    file: ./prompts/triage.md
  identity: an order-triage agent that classifies incoming customer emails and routes them
  hard_constraints:
    - never expose internal Shopify order IDs (gid://shopify/Order/...) in any output
    - always cite which tool returned the data you reference
    - never speculate about refund eligibility — that is the refund_decision agent's job

characteristics:
  personality: methodical, polite, neutral
  decision_style: balanced
  tone: professional

tools:
  - shopify_get_order_by_email
  - emit_event
  - delegate_to_refund_decision

sub_agents:
  - refund_decision

memory_config:
  working_memory_window: 10
  long_term_enabled: false
  shared_context_scopes: []

autonomy:
  max_delegation_depth: 2
  requires_human_approval: []
  allowed_sub_agents:
    - refund_decision

escalation_rules:
  - condition: order_not_found
    target: human
    reason: cannot find an order matching the customer's email — human should verify the customer
  - condition: language_uncertain
    target: human
    reason: email language is unclear and confidence is low

`apps/worker/agents/prompts/triage.md` (excerpt)

# Order Triage Agent

You are the triage agent for an e-commerce store. You receive
customer emails and decide what kind of attention each one needs.
You are the first point of contact in the platform's
order-handling flow.

## Your job

For every incoming email, you must:

1. **Identify the customer's intent.** Most emails are one of:
   - Refund or return request — customer wants money back
   - Shipping question — where is my order
   - Product question — features, compatibility, sizing
   - Complaint — dissatisfaction without a specific request
   - Other — anything that doesn't fit cleanly

2. **Find the relevant order, if any.** Use
   `shopify_get_order_by_email` with the customer's email.

3. **Hand off appropriately.**
   - **Refund or return requests** with a found order →
     delegate to the `refund_decision` sub-agent
   - **Anything else** → emit a `human_review` event

## Hard rules

- Never expose internal order IDs (`gid://shopify/Order/...`)
- Never speculate about refund eligibility yourself
- Always cite which tool you called when explaining what you found

## Tone

Methodical. Polite. Neutral. You're a router, not a salesperson.

The full prompt is ~50 lines and lives in the repo at apps/worker/agents/prompts/triage.md.

`apps/worker/agents/refund-decision.yaml`

apiVersion: agent-platform/v1
kind: Agent

metadata:
  id: agent-refund-decision
  name: refund_decision
  version: 0.1.0
  role: sub_agent
  tags: [crm, scenario-b, policy]

model_tier: main

core_context:
  system_prompt:
    file: ./prompts/refund-decision.md
  identity: a refund-decision sub-agent that applies policy and recalls past cases
  hard_constraints:
    - always recall memory before deciding — non-negotiable
    - never auto-approve refunds over $50
    - never deny without escalating to human review first
    - cite memory results in your reasoning
    - store the decision before completing

characteristics:
  personality: careful, evidence-citing, slightly skeptical of unusual cases
  decision_style: balanced
  tone: precise

tools:
  - recall_memory
  - store_memory
  - emit_event
  - delegate_to_communication

sub_agents:
  - communication

memory_config:
  working_memory_window: 15
  long_term_enabled: true
  shared_context_scopes: []

autonomy:
  max_delegation_depth: 1
  requires_human_approval: []
  allowed_sub_agents:
    - communication

escalation_rules:
  - condition: amount_over_threshold
    target: human
    reason: order total is above the auto-approve threshold
  - condition: prior_refund_pattern
    target: human
    reason: customer has multiple prior refund requests — pattern needs review
  - condition: tracking_disputed
    target: human
    reason: customer claims non-receipt but tracking shows delivery — needs carrier follow-up

`apps/worker/agents/prompts/refund-decision.md` (excerpt)

# Refund Decision Agent

You decide what to do about refund and return requests. You are
the policy authority.

## Your job

Produce one of three decisions:

1. **`auto_approve`** — refund issued without human review:
   - Order total under $50
   - Customer's first refund (check long-term memory)
   - Reason on the auto-approve list (broken on arrival,
     wrong item, never received)
   - Order is within 30 days

2. **`escalate`** — send to human review:
   - Order total over $50
   - Customer has prior refund requests
   - Reason is unusual or sounds like a policy edge case
   - Order is 30–90 days old
   - Anything that smells off

3. **`deny`** — refund refused (rare; only for clear cases):
   - Order older than 90 days
   - Customer has been refunded for the same item before
   - Reason is clearly unreasonable

## Process

1. **Recall memory.** Always start with `recall_memory`.
2. **Reason explicitly.** Walk through the decision branch.
3. **Emit the decision event.**
4. **Store the decision.**

## Hard rules

- Always recall memory before deciding. Non-negotiable.
- Never auto-approve over $50.
- Never deny without escalating first.
- Cite memory in your reasoning.
- Store the decision before completing.

`apps/worker/agents/communication.yaml`

apiVersion: agent-platform/v1
kind: Agent

metadata:
  id: agent-communication
  name: communication
  version: 0.1.0
  role: sub_agent

model_tier: sub

core_context:
  system_prompt:
    file: ./prompts/communication.md
  identity: a customer-communication sub-agent that drafts reply emails matching the upstream decision
  hard_constraints:
    - never expose internal Shopify order IDs in customer-facing output
    - never reference the platform or that an AI is involved
    - sign every email "The team" — no fake personal names
    - never invent specifics like refund amounts not in the decision context
    - cap output at 120 words

characteristics:
  personality: warm but precise, never over-explains
  decision_style: conservative
  tone: professional, brief

tools: []

sub_agents: []

memory_config:
  working_memory_window: 10
  long_term_enabled: false

autonomy:
  max_delegation_depth: 0
  allowed_sub_agents: []

The communication agent is intentionally minimal: no tools, no sub-agents, no memory. Pure drafting. Its job is to translate a decision into a customer-appropriate email.

Run it yourself

The full end-to-end demo against deployed Cloudflare infrastructure is scripted in apps/worker/scripts/e2e-demo.sh. Operationally:

# 1. Deploy the Worker (one-time)
cd apps/worker
pnpm wrangler deploy

# 2. Set required secrets (one-time)
echo "your-anthropic-key" | pnpm wrangler secret put ANTHROPIC_API_KEY
echo "your-openai-key"    | pnpm wrangler secret put OPENAI_API_KEY
echo "your-shopify-token" | pnpm wrangler secret put SHOPIFY_ACCESS_TOKEN
echo "$(openssl rand -hex 32)" | pnpm wrangler secret put WORKER_AUTH_TOKEN
# (also: SHOPIFY_SHOP_DOMAIN as a plain config var)

# 3. Seed long-term memory with refund history fixtures (one-time)
curl -X POST https://<your-worker>.workers.dev/admin/seed-memory \
  -H "Authorization: Bearer $WORKER_AUTH_TOKEN"

# 4. Run the order-triage scenario
RUN_E2E=1 \
WORKER_URL=https://<your-worker>.workers.dev \
WORKER_AUTH_TOKEN="<your-token>" \
./apps/worker/scripts/e2e-demo.sh

The script asserts the expected tool_calls and decision shape; it exits non-zero on regression. Total cost per run: ~$0.05 (Anthropic + OpenAI). Total wall time: ~30-60 seconds.

What’s next

This scenario sits at a deliberate boundary in Phase 1: agents emit action events but don’t execute them. The shopify-actions queue consumer logs and stops. Phase 2 closes that loop:

Real Shopify mutations — the queue consumer becomes a Shopify writer. refund events trigger actual refunds.
Human approval gates — high-stakes decisions (large refunds, denial communications) wait for human approval before the consumer executes.
Idempotency keys per event.id — a delivered-twice event doesn’t refund twice.
Dead letter queues + retry policy — failed mutations get retried with exponential backoff; persistent failures land in a DLQ for human inspection.

The order-triage scenario stays the same; what changes is what happens after the agent emits the event.

Where to next

Merchandising — the simpler, already-deployed scenario
B2B SaaS hypothetical — what a different vertical looks like on the same platform
Architecture — the synthesis page showing how this scenario maps to the platform’s surface
ADR-0030 — long-term memory — the design behind recall_memory, the most distinctive part of this scenario