AI changes the contact-centre QA function in two big ways. First, the AI itself is now one of the things being QA'd, so the function has to cover machine behaviour as well as human behaviour. Second, the coverage rate goes from a sample to all conversations, because AI-assisted QA can score every contact. The team that runs QA shifts from spot-checking to pattern-watching.
A QA lead used to listen to ten calls a week per agent and score them against a rubric. Three things changed. AI now answers many of the calls, so the AI gets its own scorecard. AI-assisted QA reads every conversation, not ten of them. The lead's time moves from rating individual contacts to spotting where the AI is drifting and where the team's quality has shifted in ways that need a coaching response.
What people in the field are saying
kdschemin's "Reliability is the product" frames the AI's own quality as a continuous reliability problem, not a project. The QA function now owns that reliability problem alongside human-agent quality.
What does the AI need to be scored on?
Five dimensions, broadly. Factual accuracy (did the AI quote the policy correctly). Action correctness (did the AI take the right action and did it actually happen). Tone (did the AI sound the way you want a brand representative to sound). Containment quality (did the customer get a resolution, or did they leave the chat). Escalation quality (when the AI handed over, did it do so well).
What does full-coverage QA actually look like?
Every conversation is scored by a model against the QA rubric. The output is a per-conversation score, a per-agent (or per-AI) trend, and a list of conversations to spot-check by hand because the model flagged them as unusual. The human QA reviewer reads those, not a random ten.
Where does AI-assisted QA fall short?
It is over-confident on the easy cases (the conversation that clearly went well) and over-cautious on the edge cases (the conversation that needed creativity from the agent). Pure automation produces flat scores. Combining the automated coverage with human review of the flagged cases is the working pattern.
What changes for the QA function itself?
The work moves up the abstraction ladder. Spot-checking goes to the machine. Calibration of the rubric, training of the team based on patterns, and tuning of the AI based on its own failures, become the human jobs. The headcount on QA may shrink, but the seniority of the people in those roles tends to rise.
Related: the field note on metrics beyond deflection rate, why containment numbers are misleading, and the glossary explainer on first contact resolution.