Why are AI containment numbers misleading?

AI containment rate measures the share of customer contacts an automated channel handled without escalation to a human, but it does not measure whether the customer's problem was actually solved. A customer who gives up and calls a different channel still counts as contained.

A contact centre's bot dashboard shows 75% containment. The leadership team takes that as a sign the AI is doing its job. Meanwhile, phone-line volume is creeping up, and the same customers who left the chat at 2pm are calling at 2:30pm to ask the same thing. The dashboard sees a win. The customer feels none.

What people in the field are saying

CX Decoded has a piece, "The AI metrics mirage: why your contact...", that argues the metrics dashboards on AI customer service are flattering operators by counting things the customer would not count as success.

What does containment actually measure?

Containment is, at its plainest, the share of bot sessions that ended without a transfer to a human in the same session. That is a workload metric, not a resolution metric. It tells you how often the bot kept the conversation. It does not tell you whether the customer got what they came for.

Where does it mislead?

Three places. First, customers who give up and close the chat in frustration count as contained. Second, customers who try one channel, fail, and try another are counted as contained in the channel they left. Third, customers who get a wrong answer they only discover later count as contained in the moment, even though the contact will reappear in some other form.

Why is this getting worse with AI?

Because the bots are better at sounding helpful. A scripted chatbot would obviously fail; the customer escalated; the metric reflected reality. A fluent AI agent gives a plausible-looking reply, the customer closes the chat, and the system writes that down as a win. The fluency hides the failure.

What should you measure instead?

Re-contact rate across channels: did the customer come back, anywhere, about the same problem, within a window. First contact resolution measured from the customer's perspective, not from a ticket being closed. Both are harder to instrument than containment, which is why they get skipped. Both are closer to what the customer experienced.

To see what the gap costs in money, the cost-per-resolution simulator separates a claimed containment rate from the share genuinely solved and prices the difference.

Related: the field note on deflection rate, four metrics that beat deflection rate, and the glossary explainer on containment rate.