- How do we baseline before we ship?
- Before any model goes near production, we instrument the current state: cycle times, error rates, labor hours, decision latency, whatever the use case actually moves. We capture six to twelve weeks of pre-deployment data where the workflow allows, or a matched control where it doesn't. The baseline becomes the contract; the post-deployment number is measured against it, not against a hypothetical.
- Which costs are real, which are theatrical?
- Real costs are fully-loaded: inference, fine-tuning, evaluation infrastructure, the engineers maintaining it, the analysts reviewing outputs, and the opportunity cost of the team that built it. Theatrical costs are the ones in vendor decks that quietly exclude observability, human review, and rework. We build the all-in unit economics so the CFO sees the same number we do.
- When do we double down vs. when do we kill an initiative?
- We set kill criteria at the start of the engagement, not at the post-mortem. If the system misses its baseline delta after a defined runway (typically two quarters of production use), we recommend sunset. If it clears the bar, we model what the next dollar of investment buys and stack-rank against other initiatives. The discipline is making the call with the same evidence either way.
- How do we report AI ROI to the board without making things up?
- Three numbers, all defensible: incremental margin or cost out (measured against the baseline), all-in cost to operate (not just the build), and the residual risk on the books (model drift, vendor concentration, compliance exposure). We help draft the board narrative so it survives questioning from the audit committee, not just the technology committee.
- How long until a value realization framework starts producing usable numbers?
- Six to ten weeks to baseline and instrument; another quarter of production data before the numbers carry signal. We resist publishing ROI earlier than that. Premature claims are the fastest way to lose credibility with the CFO's office, and once lost it doesn't come back in the same fiscal year.
- Do you measure adoption separately from value?
- Yes. Adoption is a leading indicator; value is the lagging one. We track both because a system can be heavily used and produce negligible margin impact, or lightly used and move the number that matters. Conflating them is how AI programs end up reporting activity instead of outcomes.