An AI customer service tool should be bought on one measured thing: how often it actually solves the customer's problem, on your own data. Demos and deflection rates both reward the look of working. Resolution is the test, and it belongs in the contract, not just the evaluation.

The usual AI buying process tests two things: the demo, and a projected deflection rate. Both can look excellent for a tool that resolves little. This article is about replacing them with a single requirement the vendor has to meet, and writing it into the deal.

Why the demo and the deflection rate both mislead

Mark Levy, who writes Decoding Customer Experience, warns that an AI can sound smart and still not do the job. A demo is built from questions the tool is ready for, so every modern tool demos well. It tests fluency, not resolution.

The deflection rate has the same flaw from the other side. It counts contacts the AI closed without a human. A tool that talks customers into giving up scores a high deflection rate while resolving nothing. Buy on the demo and you buy fluency. Buy on projected deflection and you buy containment. Neither is the thing you actually want, which is the customer's problem solved.

The test: measured resolution on your own data

Resolution can be measured, and it has to be measured on your data, not the vendor's. The test is straightforward. Take a sample of your own real customer contacts, including the hard and unusual ones. Run them through the vendor's tool. Have a human reviewer judge each outcome on one question: was the customer's actual problem solved. The share that were is the tool's resolution rate for you.

That number is comparable across vendors, grounded in your real contacts, and immune to the demo effect. If a vendor will not let you run this test on your own data, that refusal is itself a result.

Put the number in the contract

Running the test at evaluation is good. Writing the result into the contract is what makes it bind. The measured resolution rate becomes a committed figure: the vendor agrees the tool will hit it on your contacts, it is checked on a defined cadence after launch, and falling below it has a defined consequence: a remedy period, a price adjustment, or an exit right.

This changes the vendor's incentive. A resolution rate that lives only in a pre-sale evaluation is something the vendor wants to look good once. A resolution rate in the contract is something the vendor has to keep true. It also protects you from drift: an AI tool's performance moves over time, and a contractual resolution rate gives you a defined point at which a decline becomes the vendor's problem, not yours.

What to do before you sign: assemble 50 of your own real customer contacts, weighted toward the hard ones. Have every shortlisted vendor's tool handle them, and score each as resolved or not by a human reviewer. Take the resolution rate of the winner and write it into the contract as a committed figure, with a checking cadence and a defined consequence for falling below it.

Buy the outcome, not the appearance

Demos and deflection rates persist as buying criteria because they are easy and they flatter the purchase. They measure how a tool appears, and an AI tool can appear to work while resolving very little.

Resolution measured on your own data is harder to produce and harder to game, which is exactly why it is the criterion worth using. Making it a contractual requirement is what keeps the tool honest after the sale is closed.