An AI customer service pilot that works is not most of the way to production. The pilot ran on a small, clean slice of data. Production needs the whole messy dataset made reliable, and that work is the larger cost, counted in engineering time and calendar time.

A pilot succeeds, the demo lands, and the project plan treats the rest as a rollout: turn it on for everyone. Months later the rollout has stalled, and the reason is rarely the model. It is the data underneath, which the pilot was carefully set up to avoid.

This article maps the real cost of crossing from a working pilot to a working production system, so the budget reflects the project you actually have.

What the pilot quietly left out

A pilot is designed to show the model can do the task. To do that cleanly, it runs on a controlled slice of data: a few hundred records, checked by hand, for one well-behaved use case. That is the right way to test a model. It also means the pilot proved one thing only, and it is not the thing production depends on.

The common belief is that a successful pilot has retired the technical risk. It has retired the model risk. The data risk is untouched, because the pilot was built to keep the messy data out of view.

Production is a data-foundation project

In production, the AI agent meets the real dataset: the same customer recorded three ways across four systems, records with no clean history, fields that mean different things in different regions. Making that usable is the work, and Kevin Davis, who writes KD Be Schemin, lists what it involves in his piece on the non-negotiables for AI to succeed: a resolvable customer identity, a current source of truth, readable history, and access controls.

Davis makes the sharper point elsewhere that, for a system acting on customer data, reliability is the product. A dashboard built on shaky data shows a wrong number. An AI agent built on shaky data takes a wrong action, in front of a customer. Production has to make the data reliable because the cost of it being wrong has gone up.

Where the cost actually sits

The cost of crossing the gap is mostly two things, and neither is the model licence.

The first is engineering time: the weeks of work to resolve customer identity across systems, build a current source of truth for what each customer is owed, connect a readable history, and put access controls and monitoring in place. The second is calendar time: that work runs in sequence, it depends on other teams, and it cannot be rushed by spending more on the model. The model licence is a small, predictable line. The data foundation is the large, variable one, and it is the line most pilot business cases leave blank.

This is also why the field is full of stalled deployments. Mark Levy, in Decoding Customer Experience, has noted that the reason most enterprises fail to scale AI agents to real value is not model capability. The blocker is the data plumbing, and the plumbing is exactly what the pilot skipped.

What to do before the rollout is funded: take the one workflow the pilot proved, and write the stages between it and production: resolve identity, build the entitlement source of truth, connect history, add access controls, add monitoring. Put an engineering estimate and an elapsed-time estimate on each stage. That total, not the model licence, is the real cost of the project, and it is the number the pilot business case should have carried.

How to read a successful pilot

A successful pilot is genuinely good news. It answers one question well: can the model do the task when the data is clean. It says nothing about the question production turns on, which is whether you can make the data clean enough, everywhere, and keep it that way.

Budgeting for that second question is what separates a pilot that ships from one that quietly stalls a year later. The data foundation is the project. The pilot was the part that came free.