How should you design escalation from AI to a human?

Design escalation from AI to a human as a hand-off, not a transfer. The customer should not start the conversation over. The human should pick up with the customer's identity, the conversation so far, what the AI tried, and what it could not do. That requires building escalation as a first-class part of the system, not as the place the AI dumps the contacts it could not handle.

A customer types out a long, careful description of their problem to the AI. The AI cannot help. The conversation transfers to a human agent. The agent's first message is "hi, how can I help you?" The customer types the whole thing again, with less patience. Most escalation breakdowns look exactly like this. The system handed off the customer; it did not hand off the context.

What people in the field are saying

Service Matters has a piece in "Demystifying orchestration: the key..." arguing that the orchestration between agents (AI and human) is the difference between an AI deployment that helps customers and one that frustrates them.

What does a clean handover include?

Four things, minimum. The customer's identity, already verified. The conversation transcript, available to the agent on the same screen. A short summary of what the customer is asking for, written by the AI. A list of what the AI tried, with results (looked up the order, found a delay; tried to issue a credit, blocked by the rule on amount). The human starts with all four in view.

When should the AI escalate?

Four moments. When its confidence drops below a defined threshold, before it tries to bluff. When the customer asks for a human, on the first ask, not the third. When the action requested is outside the AI's authorised scope. When the customer signals distress (frustrated language, repeated re-asks of the same question).

Where does escalation go wrong?

The AI tries too long, frustrating the customer; or the AI escalates too fast, into a queue with no humans available, leaving the customer worse off than if the AI had attempted. The threshold is real work to tune, and it has to be tuned to current capacity, not theoretical.

How do you measure whether escalation is working?

Three indicators. Whether the customer repeated their problem to the human (a count from transcripts, easy to instrument). Whether the human resolved the escalated case on first contact. Customer-reported "did the handover help" on the post-contact survey. If any of the three is bad, the escalation is broken, regardless of how good the AI looked on the contacts it kept.