Skip to content

Order Triage

The platform’s most-exercised scenario. A customer email lands; within a single agent run, it gets classified, a refund decision gets made (informed by the customer’s history), and a customer-facing reply gets drafted — all without a human in the loop.

This is the scenario that makes the platform’s distinctive features concrete: multi-agent delegation, long-term memory, event-driven side effects, and YAML-defined agents.

A small e-commerce store gets dozens of customer emails per day. Most are mundane: shipping questions, product questions, the occasional complaint. A meaningful slice are refund requests. Triaging them costs time; deciding on them costs judgment (small refunds are obvious, larger or repeat-customer cases are not); and replying to customers in the right tone costs care.

The order-triage scenario automates the first two of these and drafts the third for human review. Three agents collaborate:

  1. triage — the front door. Classifies every incoming email; routes refund cases to the decision sub-agent; forwards everything else to a human queue.
  2. refund_decision — the policy authority. Recalls past cases for this customer, applies refund policy, decides: auto-approve, escalate, or deny.
  3. communication — the customer-facing voice. Drafts the reply email matching the decision, in the right tone, under 120 words.

A typical refund case, end to end:

Triage receives the request:

  1. HTTP client → triage: POST /run with { instructions: "Anna says her towel arrived damaged. She wants a refund.", payload: { email: "anna@example.com" } }
  2. triage runs LLM turn 1, classifies the email as “refund or return request”
  3. triage runs LLM turn 2, calls shopify_get_order_by_email for anna@example.com — finds order #1042, total $49 SEK, ordered 12 days ago
  4. triage runs LLM turn 3, decides to delegate. Calls delegate_to_refund_decision with the order details and the customer’s reasoning

The runtime synthesizes the delegation tool from the sub-agent list and invokes runTurn() recursively:

  1. triage → refund_decision: sub-agent gets a fresh six-layer context — its own system prompt (the policy authority), its own characteristics (careful, evidence-citing), its own tool list
  2. refund_decision → long-term memory: recall_memory("refund history for anna@example.com") returns 1 prior match: a successful $32 refund 4 months ago for a different item
  3. refund_decision runs LLM turn 1, reasons:
    • Order total $49 SEK, under the $50 auto-approve threshold
    • One prior refund 4 months ago, not a pattern
    • Reason given (damaged on arrival), on the auto-approve list
    • Order is 12 days old, well within the 30-day window
    • Decision: auto_approve
  4. refund_decision → shopify-actions queue: emit_event with { action: "refund", order_id: "#1042", amount: 49, reason: "damaged on arrival" }
  5. refund_decision → long-term memory: store_memory("Approved $49 refund for order #1042, anna@example.com, reason: damaged on arrival, no prior pattern")
  6. refund_decision delegates to communication

The communication sub-agent drafts the reply:

  1. refund_decision → communication: sub-agent gets the decision (auto_approve) and the context (order number, amount, reason)
  2. communication runs LLM turn 1, picks the apologetic-warm-action-oriented tone block, drafts a 47-word reply confirming the refund and setting expectations for the 3-5 business day return

The chain unwinds:

  1. communication → refund_decision: returns the email body
  2. refund_decision → triage: AgentReport { decision: "auto_approve", summary, email_draft }
  3. triage → HTTP client: 200 OK with the consolidated report

Side effects: the shopify-actions queue now has one event ready for a human-or-machine consumer to act on. In Phase 1 the consumer logs the event and stops; in Phase 2 it’ll execute the actual Shopify mutation behind a human approval gate.

Platform featureHow this scenario uses it
YAML agent definitions (ADR-0031)All three agents are defined as YAML in apps/worker/agents/; prompts are separate Markdown files
Six-layer context (ADR-0006)Each agent gets fresh layers 1-2 (its identity); layer 4 carries the delegated task; layers 5-6 carry runtime state and recalled memories
Delegation as tool (ADR-0022)delegate_to_refund_decision and delegate_to_communication are synthesized at runtime from the sub-agent lists
Long-term memory (ADR-0030)refund_decision calls recall_memory (read) and store_memory (write); memories scoped per-agent and per-tenant
Event emission (ADR-0032)emit_event puts side-effect requests on the shopify-actions queue; consumers handle them async
Custom Shopify toolshopify_get_order_by_email is a hand-registered tool in the Worker’s tool registry
Hard constraints in promptsEach agent’s YAML lists its hard_constraints; the runtime composes them into the system prompt; the LLM enforces them
Model tiering (ADR-0008)triage and refund_decision use model_tier: main (Sonnet); communication uses model_tier: sub (Haiku — faster and cheaper for short drafting)

The actual YAML and Markdown files that define this scenario. These are bundled into the deployed Worker (ADR-0033); they ship as part of the codebase, not as runtime config.

apiVersion: agent-platform/v1
kind: Agent
metadata:
id: agent-triage
name: triage
version: 0.1.0
role: main
tags: [crm, scenario-b]
model_tier: main
core_context:
system_prompt:
file: ./prompts/triage.md
identity: an order-triage agent that classifies incoming customer emails and routes them
hard_constraints:
- never expose internal Shopify order IDs (gid://shopify/Order/...) in any output
- always cite which tool returned the data you reference
- never speculate about refund eligibility — that is the refund_decision agent's job
characteristics:
personality: methodical, polite, neutral
decision_style: balanced
tone: professional
tools:
- shopify_get_order_by_email
- emit_event
- delegate_to_refund_decision
sub_agents:
- refund_decision
memory_config:
working_memory_window: 10
long_term_enabled: false
shared_context_scopes: []
autonomy:
max_delegation_depth: 2
requires_human_approval: []
allowed_sub_agents:
- refund_decision
escalation_rules:
- condition: order_not_found
target: human
reason: cannot find an order matching the customer's email — human should verify the customer
- condition: language_uncertain
target: human
reason: email language is unclear and confidence is low

apps/worker/agents/prompts/triage.md (excerpt)

Section titled “apps/worker/agents/prompts/triage.md (excerpt)”
# Order Triage Agent
You are the triage agent for an e-commerce store. You receive
customer emails and decide what kind of attention each one needs.
You are the first point of contact in the platform's
order-handling flow.
## Your job
For every incoming email, you must:
1. **Identify the customer's intent.** Most emails are one of:
- Refund or return request — customer wants money back
- Shipping question — where is my order
- Product question — features, compatibility, sizing
- Complaint — dissatisfaction without a specific request
- Other — anything that doesn't fit cleanly
2. **Find the relevant order, if any.** Use
`shopify_get_order_by_email` with the customer's email.
3. **Hand off appropriately.**
- **Refund or return requests** with a found order →
delegate to the `refund_decision` sub-agent
- **Anything else** → emit a `human_review` event
## Hard rules
- Never expose internal order IDs (`gid://shopify/Order/...`)
- Never speculate about refund eligibility yourself
- Always cite which tool you called when explaining what you found
## Tone
Methodical. Polite. Neutral. You're a router, not a salesperson.

The full prompt is ~50 lines and lives in the repo at apps/worker/agents/prompts/triage.md.

apiVersion: agent-platform/v1
kind: Agent
metadata:
id: agent-refund-decision
name: refund_decision
version: 0.1.0
role: sub_agent
tags: [crm, scenario-b, policy]
model_tier: main
core_context:
system_prompt:
file: ./prompts/refund-decision.md
identity: a refund-decision sub-agent that applies policy and recalls past cases
hard_constraints:
- always recall memory before deciding — non-negotiable
- never auto-approve refunds over $50
- never deny without escalating to human review first
- cite memory results in your reasoning
- store the decision before completing
characteristics:
personality: careful, evidence-citing, slightly skeptical of unusual cases
decision_style: balanced
tone: precise
tools:
- recall_memory
- store_memory
- emit_event
- delegate_to_communication
sub_agents:
- communication
memory_config:
working_memory_window: 15
long_term_enabled: true
shared_context_scopes: []
autonomy:
max_delegation_depth: 1
requires_human_approval: []
allowed_sub_agents:
- communication
escalation_rules:
- condition: amount_over_threshold
target: human
reason: order total is above the auto-approve threshold
- condition: prior_refund_pattern
target: human
reason: customer has multiple prior refund requests — pattern needs review
- condition: tracking_disputed
target: human
reason: customer claims non-receipt but tracking shows delivery — needs carrier follow-up

apps/worker/agents/prompts/refund-decision.md (excerpt)

Section titled “apps/worker/agents/prompts/refund-decision.md (excerpt)”
# Refund Decision Agent
You decide what to do about refund and return requests. You are
the policy authority.
## Your job
Produce one of three decisions:
1. **`auto_approve`** — refund issued without human review:
- Order total under $50
- Customer's first refund (check long-term memory)
- Reason on the auto-approve list (broken on arrival,
wrong item, never received)
- Order is within 30 days
2. **`escalate`** — send to human review:
- Order total over $50
- Customer has prior refund requests
- Reason is unusual or sounds like a policy edge case
- Order is 30–90 days old
- Anything that smells off
3. **`deny`** — refund refused (rare; only for clear cases):
- Order older than 90 days
- Customer has been refunded for the same item before
- Reason is clearly unreasonable
## Process
1. **Recall memory.** Always start with `recall_memory`.
2. **Reason explicitly.** Walk through the decision branch.
3. **Emit the decision event.**
4. **Store the decision.**
## Hard rules
- Always recall memory before deciding. Non-negotiable.
- Never auto-approve over $50.
- Never deny without escalating first.
- Cite memory in your reasoning.
- Store the decision before completing.
apiVersion: agent-platform/v1
kind: Agent
metadata:
id: agent-communication
name: communication
version: 0.1.0
role: sub_agent
model_tier: sub
core_context:
system_prompt:
file: ./prompts/communication.md
identity: a customer-communication sub-agent that drafts reply emails matching the upstream decision
hard_constraints:
- never expose internal Shopify order IDs in customer-facing output
- never reference the platform or that an AI is involved
- sign every email "The team" — no fake personal names
- never invent specifics like refund amounts not in the decision context
- cap output at 120 words
characteristics:
personality: warm but precise, never over-explains
decision_style: conservative
tone: professional, brief
tools: []
sub_agents: []
memory_config:
working_memory_window: 10
long_term_enabled: false
autonomy:
max_delegation_depth: 0
allowed_sub_agents: []

The communication agent is intentionally minimal: no tools, no sub-agents, no memory. Pure drafting. Its job is to translate a decision into a customer-appropriate email.

The full end-to-end demo against deployed Cloudflare infrastructure is scripted in apps/worker/scripts/e2e-demo.sh. Operationally:

Terminal window
# 1. Deploy the Worker (one-time)
cd apps/worker
pnpm wrangler deploy
# 2. Set required secrets (one-time)
echo "your-anthropic-key" | pnpm wrangler secret put ANTHROPIC_API_KEY
echo "your-openai-key" | pnpm wrangler secret put OPENAI_API_KEY
echo "your-shopify-token" | pnpm wrangler secret put SHOPIFY_ACCESS_TOKEN
echo "$(openssl rand -hex 32)" | pnpm wrangler secret put WORKER_AUTH_TOKEN
# (also: SHOPIFY_SHOP_DOMAIN as a plain config var)
# 3. Seed long-term memory with refund history fixtures (one-time)
curl -X POST https://<your-worker>.workers.dev/admin/seed-memory \
-H "Authorization: Bearer $WORKER_AUTH_TOKEN"
# 4. Run the order-triage scenario
RUN_E2E=1 \
WORKER_URL=https://<your-worker>.workers.dev \
WORKER_AUTH_TOKEN="<your-token>" \
./apps/worker/scripts/e2e-demo.sh

The script asserts the expected tool_calls and decision shape; it exits non-zero on regression. Total cost per run: ~$0.05 (Anthropic + OpenAI). Total wall time: ~30-60 seconds.

This scenario sits at a deliberate boundary in Phase 1: agents emit action events but don’t execute them. The shopify-actions queue consumer logs and stops. Phase 2 closes that loop:

  • Real Shopify mutations — the queue consumer becomes a Shopify writer. refund events trigger actual refunds.
  • Human approval gates — high-stakes decisions (large refunds, denial communications) wait for human approval before the consumer executes.
  • Idempotency keys per event.id — a delivered-twice event doesn’t refund twice.
  • Dead letter queues + retry policy — failed mutations get retried with exponential backoff; persistent failures land in a DLQ for human inspection.

The order-triage scenario stays the same; what changes is what happens after the agent emits the event.