The most important decision in an AI customer service project is not which tool to buy. It is agreeing, in writing and before any money is spent, how you will judge whether the project worked. Teams that skip this step end up measuring success with whatever numbers happen to look good later.

Plenty has been written about why AI projects fail to pay off. Almost all of it looks backward, after the rollout, asking what went wrong. A simpler problem hides in front of that one. Most projects never agreed, up front, what paying off would even look like.

This article is about the step that comes before the spending: writing down how the project will be judged, while you can still be honest about it. Call it the pre-deployment measurement agreement.

Measuring after the fact always flatters the project

The common assumption is that you will know success when you see it. You will not. Once a project is live and the budget is spent, nobody on the team wants it to have failed. So the team reaches for the numbers that look good: tickets closed, average handle time, cost per contact. Those numbers almost always improve, because the AI is doing something.

What gets quietly dropped is the harder question of whether customers were actually helped. Michael Howlett, who writes Customer Experience Decoded, calls this the AI metrics mirage: the dashboard looks strong while customers leave. Measuring after the fact does not just risk this outcome. It almost guarantees it, because the people choosing the metrics now have a reason to choose the kind ones.

What to agree before you spend

Three things, written down and signed off before the budget is approved.

First, the comparison. What would the next year look like if you did not run this project at all: customer numbers, costs, complaint levels. This is the line the project has to beat. Without it, any improvement looks like a win, including one that would have happened anyway.

Second, the real success metrics. Pick the numbers that show whether customers were helped, not just processed. Howlett's list is a good start: first-contact resolution by channel, how often customers come back within a few days, the quality of escalations, and complaints or churn in the weeks after an AI contact. Write these down now, because they are harder to game and nobody will volunteer them later.

Third, who signs off. Name the person accountable for the comparison and the metrics. Not the vendor, not the project's champion. Someone who can later say, plainly, whether the project hit the line it agreed to.

What to run before the budget meeting: write a one-page measurement agreement. It states the do-nothing comparison, the four or five real success metrics with a target for each, and the name of the person who signs off. Everyone approving the spend signs the same page. If the project cannot say what success looks like before it starts, it is not ready to start.

Why this is hard, and worth it

This step is unpopular for an obvious reason. It is the one moment in the project where someone has to commit to a number they can be judged against. It is far more comfortable to start spending and sort out the measurement later. That comfort is exactly the problem.

Kevin Davis, who writes the KD Be Schemin newsletter, argues that the return on AI is rarely missing because the technology failed. In his words, the ROI is not missing, the redesign is: the work around the technology, the measurement, the accountability, the change in how people work, was never done. The pre-deployment measurement agreement is a small piece of that work. It is also the piece that decides whether you will ever know if the rest of it paid off.

The point of the agreement is to fix the measure of success while the project still has no result to defend. Once it is live, that honesty is much harder to find.