Escalation is where agentic systems earn or lose trust. The difference between clean handoffs and buried problems is twenty lines of policy and one observable signal.
Policy beats prompts
Prompts ask nicely. Policy enforces behavior. Production escalation needs thresholds, named owners, and logging — not "please involve a human when unsure."
We write escalation rules as code-adjacent specs: trigger, owner, channel, SLA, and fallback.
Observable signals
Confidence scores alone are unreliable. We pair them with business signals: deal size, sentiment shift, compliance keywords, or repeat failure on the same ticket type.
Every escalation emits an event the manager sees in the same dashboard as the KPI.
If the manager learns about failures from customers, the escalation design failed.
Testing escalation before launch
We run red-team scripts against escalation paths before go-live: ambiguous inputs, adversarial users, and missing data cases.
Agents that pass happy-path demos but fail red-team stay in shadow mode.