Voice AI works well on routine, well-defined contact-centre calls (order status, password resets, simple bookings) when latency is low and the AI is connected to live data. It works badly on complex calls, emotional calls, and any call where the customer's speech is hard to recognise.

A customer phones a retailer to ask where their parcel is. The voice AI picks up, recognises the order number, reads the status off the order system, and tells the customer the parcel is out for delivery. The call lasts twenty seconds. A different customer phones the same line about a damaged item, becomes upset, switches between English and Spanish mid-sentence, and the same voice AI starts looping. Both calls reach the same product. They have very different outcomes.

What people in the field are saying

Service Matters covers how operators are choosing conversational and voice AI in "How winning companies choose a conversational...". Honest discussion of where voice AI breaks tends to live in long-form posts like kdschemin's "Your AI support agent closed the...".

What does voice AI do well?

Routine calls with a clear intent and a clean data lookup. Order status. Account balance. Appointment booking. The AI hears the request, identifies the customer, queries the system, and replies. Performance on these is now close to a competent human agent for handle time, and better for availability.

Where does it break?

Four places. First, accent and audio quality: the customer's speech is hard to recognise (strong accent, noisy environment, poor phone line) and the conversation degrades. Second, complex intent: the request needs multiple steps or judgement, and the AI either tries and fails or refuses without explaining. Third, emotional calls: the customer is upset, the AI cannot respond to that, and the call gets worse. Fourth, latency: the gap between the customer finishing a sentence and the AI replying feels wrong, and the call sounds broken even when the words are right.

What is harder than vendors say?

The handover to a human. A well-deployed voice AI passes the call to a person with the customer's identity, the conversation context, and what was tried. A poorly-deployed one drops the customer back into a queue with nothing, and the customer starts over. The latter is more common.

How do you test it before deploying?

Take recordings of real calls, including the harder ones, and replay them through the system. Listen for misheard words, slow replies, weak handovers, and confidently-wrong answers. The gap between a vendor demo and live calls is large, and the demo never includes a customer with a strong accent calling from a busy room.

Related: the field note on the accent gap in voice AI, the glossary explainer on how voice AI works, and use case 2: checking an order or account status.