
You Can't Afford to Commit. You Can't Afford to Wait.
Two years deep in AI transformation inside a regulated company. The version of AI adoption sold from conference stages bears almost no resemblance to what it looks like where decisions have to be right. This is the field guide nobody gave you.
Executive Summary
Two years deep in AI transformation inside a regulated company. The version of AI adoption sold from conference stages bears almost no resemblance to what it looks like where decisions have to be right. This is the field guide nobody gave you.
I've spent two years deep in AI transformation. The technology is remarkable. It's also the most demanding commitment my company has ever made — not because it doesn't work, but because the subtle differences between how it works today and how it worked yesterday can be difficult to measure, difficult to understand, and impossible to ignore.
AI is powerful, it's accelerating, and we are still at the very beginning. The organizations that jump on now will have a substantial head start over those that wait — if they get it right. That's a capital-I "if," and it's the reason this article exists.
But the version of AI adoption being sold from conference stages, vendor demos, and countless podcasts bears almost no resemblance to what it actually looks like inside a company where decisions have to be right. Where "close enough" isn't a rounding error — it's a regulatory violation.
The noise is deafening. Firms are declaring themselves "AI native." Executives are making sweeping proclamations to staff, clients, and shareholders about how AI is transforming their operations. Some are announcing massive layoffs on the back of their AI successes. And sure — some of those successes are real. But I'd be willing to bet it's not all as it seems. Not even close.
If you're a CEO, a division head, a CTO at a company where the stakes are real — where your outputs affect consumers, touch compliance, or feed into decisions that regulators can examine — this is the field guide nobody gave you. Not because anyone is hiding the hard parts, but because the people building these tools are focused on the remarkable gains they're achieving — and they're only scratching the surface. Whether driven by pure scientific pursuit or competitive pressure, the race is on. The labs, the vendors, the solution builders — they're moving as fast as they can to bring as much AI capability to life as quickly as possible. We're all in this together, excited and concerned at the same time for both the benefits and the risks.
Here's what two years on the inside actually looks like.
But first, a necessary disclaimer: this article does not attempt to boil the ocean.
The landscape of AI governance spans international standards, regulatory frameworks, risk classification models, bias auditing, security posture management, data residency, model registries, agent identity controls, and continuous compliance monitoring — each evolving on its own track, each deserving its own deep dive.
If you're a leader trying to get your arms around all of it at once, you already understand the problem this article is about. My goal isn't to solve every dimension. It's to help you grasp the scope of what needs to be considered — from the perspective of someone who's been living inside it — and to give you a practical starting point for how to lead through it.
We Know What "Right" Looks Like. That's the Problem.
My company has spent twenty years building data verification products for the mortgage lending industry. We serve thousands of financial institutions. Our platform powers billions of critical decisions in lending, wire fraud detection, tenant screening, watchlist monitoring and more.
When we started, our false positive rates were as high as 15-20%. Today, we're at a fraction of a percent. That didn't happen because we found a shortcut. It happened because we built a deterministic rules engine — layer by layer, year by year — where every single decision is traceable, explainable, and reproducible.
If a consumer is denied a mortgage, the reason has to be 100% understood and disclosed. We know where the data came from. We know what time it was pulled. We know which rule triggered the flag. We can walk an examiner through the entire chain, start to finish. That transparency took two decades to build. It's the foundation everything else sits on.
When you operate in a world that demands this level of precision, you develop a very specific lens for evaluating new technology. You don't ask "what can it do?" You ask "can it do it the same way every time, and can I prove it?"
Every business has its own version of "right" — the standard you've spent years building toward, the bar your customers and regulators hold you to, the thing that makes your operation yours. Whatever that looks like in your world, AI has to meet it. And AI, as it exists today, is still figuring out how to get there consistently.
What We Learned When We Actually Built It
About eighteen months ago, we partnered with one of the major AI Labs on a sponsored project to build AI-powered solutions for the work our deterministic engine couldn't reach — verifications too messy to fully automate but too expensive to leave entirely manual. We saw enormous potential. We built working prototypes. We were genuinely excited.
Then we tested them rigorously.
The core problem emerged quickly: agentic inconsistency. You ask the model a question, you get an answer. You ask the same question again, you get a similar answer — but not the same answer. In a consumer lending environment where decisions must be reproducible and defensible, "similar" doesn't cut it. Similar is a liability.
What we were confronting was something genuinely new. Every technology we'd ever adopted was prescriptive: you defined the conditions, you defined the outcomes, and the system followed your rules. We had to let go of that grounded assumption entirely and learn to see this differently. This wasn't a better tool. It was a different kind of tool — one where we no longer set the rules, where we no longer explicitly outlined the conditions for a set of defined outcomes. That was the moment we felt both the enormity of how AI could transform our business and the realization that everything we knew about implementing technology had to be rethought.
We didn't abandon the pursuit — we forged a new path forward. Through relentless iteration, we've built real, working AI capabilities and the operational practices to govern them — capturing genuine breakthroughs while protecting our business from the blind spots. There is light at the end of this tunnel. But the rite of passage to get there is significant, and no one gets to skip it.
If you've built anything with AI, you probably recognize this arc. You found a use case. You got excited. You built a prototype. And then you discovered that the gap between a compelling demo and production-grade reliability is a canyon.
The Landscape Shifts Faster Than You Can Build On It
Agentic inconsistency was the challenge that hit us first. But as we pushed deeper, we realized it was just one layer of a much broader problem — one that extends well beyond any single model or use case and into the fundamental nature of how this technology evolves.
Here's the thing the AI labs love to tell you: "The version of the model you're using today is the worst version you'll ever see."
There's truth in that. The models genuinely improve with each generation. Over the past two years, I've watched dozens of version changes. And with every release comes the benchmarks — step-function improvements in reasoning, coding, multi-modal understanding — with reporting that's become fever-pitched with each new drop. They're not hyping. The improvements are real and growing at a rate that's hard to comprehend.
But here's what you need to understand:
Every new version breaks — or at the very least changes — what you built on the old one.
Model versions are rebuilt, not refined. Each new version is trained, post-trained, fine-tuned through processes that introduce inherent variability. When a new model drops, it doesn't just get better at things — it gets different at things. The tone shifts. The focus areas change. It might be stronger at mathematical reasoning but weaker at nuanced language tasks. The benchmarks confirm this: every model has a different performance profile. And these differences aren't cosmetic.
Google publishes model retirement schedules. Their Gemini 2.0 Flash model launched in February 2025 and retires in June 2026.
Sixteen months. That's the average shelf life of the thing you built your workflow around.
Beyond the model itself, there's an entire stack of variables that can independently shift your outcomes. Agent harnesses — the orchestration layer that determines how an AI operates, what tools it can access, and what guardrails constrain it. System instructions, prompt configurations, data retrieval layers, fine-tuning parameters, guardrail policies. Change any one of them and the behavior changes, even if the model stays the same.
If you use these tools daily, you feel this. Something shifts between versions. You can't always name exactly what changed. There's no formula for measuring it. You just know the outputs are different — and you're supposed to bet your operations on that.
If you want a concrete example: OpenAI disclosed this past week that a training incentive applied to just 2.5% of ChatGPT traffic caused their models to reference "goblins" 175% more often — a behavior that cascaded across model generations, persisted after the original feature was retired, and could only be corrected by adding a line to the system prompt asking the model to please stop. That's the current state of the art for fixing unwanted behavior at the world's leading AI lab. Now imagine you're building compliance-grade solutions on top of that.
And now there are agents. Everything we just described — the variability, the shifting ground, the outputs you can't always predict — now has the authority to act on its own at scale.
This isn't a flaw in the technology. It's the reality of a landscape in hyper-accelerated change — where the labs, the platforms, and the tooling are all evolving on independent tracks, at a pace none of us have seen before. The models get better. The capabilities expand. And every improvement reshuffles the ground you're standing on.
Which brings you to the trap. You commit to AI — invest heavily in understanding it, implementing it, building practices around it, protecting your business from its blind spots. That investment isn't a line item. It's a transformation of how your organization operates. The budget, the talent, the time required is overwhelming. Then the model changes. And you do it again. The alternative — waiting until the technology "matures" — means you risk never developing the muscle to implement it at all. The organizations learning now, even imperfectly, are building institutional knowledge that compounds. Waiting doesn't reduce risk. It just delays it while your competitors get smarter.
Both of these are true at the same time. You can't afford to fully commit because the ground keeps moving. You can't afford to wait because the learning curve is the asset.
That's the leader's dilemma. And pretending it doesn't exist doesn't make it go away.
The Playbook: What a Leader Actually Does About This
Everything above can feel paralyzing if you let it. It shouldn't.
The paradox is real — the technology is too powerful to ignore and too uncharted to trust blindly. But this isn't a problem you solve and move on from. AI needs a seat at the leadership table — not as a quarterly agenda item, but as a standing priority that demands constant attention. The market hasn't reached steady state. The processes for navigating this landscape are still being written. Until they are, this is leadership work — yours, personally — and it doesn't run on autopilot.
Here's the framework I've built over two years of operating inside this reality.
1. Build the Practice Around the Practice
You cannot rely on the technology being stable. So stop trying.
Instead, build an organizational practice around how you engage with AI — one that assumes the technology will change and plans for it.
This means establishing a dedicated governance structure—like an internal AI Council—that brings together engineering, legal, compliance, and operations. You don't just need a platform to deploy models; you need a strategic foundation designed to absorb the shock of constant updates. Your practice must prioritize creating parallel testing environments, maintaining "golden datasets" to measure output drift when models inevitably version-up, and fostering a culture where rolling back a degraded AI feature is celebrated as a win for quality control, not a failure of innovation.
2. Ask Your Vendors the Hard Questions
A recent survey of 950 banking executives by Grant Thornton found that only 18% are confident they could pass an independent AI audit. That means 82% of the people buying AI-powered tools can't verify what those tools are actually doing inside their operations.
Here are five questions every leader should be able to answer about every AI vendor. If your vendor can't give you clear answers, that tells you everything you need to know.
"What is your model deprecation schedule, and do we control the migration timeline?"
- The Answer You Need: "We provide at least 90 days' notice before a model is retired, and we offer a sandbox environment for you to test the new version against your specific workflows before you migrate."
"How is our proprietary data firewalled from your underlying training corpus?"
- The Answer You Need: "Your data is strictly isolated. We have a zero-retention policy for API inputs/outputs, and your data is never used to train or fine-tune our foundational models."
"Can you provide a reproducible audit trail for agentic decisions?"
- The Answer You Need: "Yes. Our logging framework captures the exact prompt, the data retrieved, the model version used, and the deterministic rules applied for every single output."
"What is your fallback mechanism if the primary model degrades or hallucinates?"
- The Answer You Need: "We use model routing. If the primary model fails our internal confidence checks or latency thresholds, the system automatically falls back to a deterministic rules engine or a secondary, stable model."
"How do you test for and mitigate 'output drift' between versions?"
- The Answer You Need: "We run automated regression testing against a standardized benchmark dataset for your industry, and we provide you with the delta reports before any update goes live."
These five questions won't cover everything. They're not meant to. They're meant to open the door to dozens of deeper questions your teams should be asking. But if the answers to these five aren't clear, confident, and specific — the deeper questions won't matter.
3. Map the AI That's Already in Your Building
You can't govern what you can't see. And right now, AI is embedded in tools across your organization — productivity software, CRM, analytics, communication platforms, document processing — whether anyone made a deliberate decision to adopt it or not.
The question for a leader isn't which tools have AI features. It's whether your organization has control.
Here's what you need answers to:
Do we have a comprehensive "Shadow AI" inventory? Not just the enterprise contracts IT approved, but the browser extensions, SaaS features, and shadow tools your teams activated to hit their deadlines.
What is the blast radius of the tool's outputs? Does the AI directly draft client-facing language or generate numbers for compliance reports? We need to know exactly what human outcomes its outputs influence.
Who owns the underlying model's roadmap? In most cases, it's the vendor. Do we know when they push silent updates, and do we have the ability to opt out if a new feature violates our data policies?
Is our risk taxonomy clearly defined and universally understood? Every business draws the line differently between "low-risk efficiency" and "high-risk liability." But those lines must be codified, documented, and trained on.
What is our human-in-the-loop protocol for high-stakes workflows? For our highest-risk uses, who is checking the output, what is the frequency of review, and what is our kill-switch procedure when a model's behavior shifts unexpectedly?
If you can't get clear answers to these five questions from your team, you have unmanaged risk sitting in your operations right now. And it's growing every time someone turns on a new feature.
4. Get Your Hands Dirty. Personally.
This is the one nobody wants to hear. But it's the one that matters most.
Stop Delegating the Discovery: You — the CEO, the division head, the person making the strategic calls — must use these tools yourself. Do not settle for reading the summary your CTO prepared or watching a highly polished vendor demo. You have to build something.
Embrace the Friction: Write a prompt that solves a real problem in your business. Break something and figure out why it broke. Then do it again next week, because the system you used last week has already changed.
Leverage the Low Barrier to Entry: These interfaces are built for natural language. You don't need a computer science degree or Python skills to manipulate them. The barrier to entry is zero; the barrier to mastery is infinite.
Develop "Fingertip Feel": The gap between reading about AI and using AI is the same gap between reading about swimming and being in the water. You only understand the liability of agentic inconsistency when you experience a tool doing something brilliant, followed immediately by something inexplicably wrong. That lived experience is what gives you the judgment to lead.
Make it an Unbreakable Habit: AI will humble you before it helps you. It will complicate your workflow before it streamlines it. Dedicate time weekly—at minimum—to stay in the weeds. The leaders who engage personally aren't just better informed; they are the only ones equipped to make governance decisions that actually reflect reality.
In a future article, I'll outline the specific tools and technology stack I've been using over the past two years — how it's evolved, what I've learned, and how it's transformed the way I lead. But the first step isn't picking the right tool. It's committing to picking up any tool at all.
5. Accept the Paradox and Lead Through It
AI is here. It is growing at an unprecedented scale. It is unavoidable, and it is transformative beyond any previous strategic inflection point any of us have faced.
Waiting cannot be an option. The risks of inaction are too high.
Don't make this about ROI — not yet. The ROI will come, as assured as the seasons turn. The organizations learning now, even imperfectly, are building institutional knowledge that compounds with every iteration. That knowledge is the asset. The returns follow.
Your job isn't to solve AI. It's to create the conditions where your organization can learn, adapt, and use it responsibly while the technology matures. Set the boundaries. Demand the transparency. Build the muscle. And revisit all of it regularly, because the technology will change, and your governance posture has to change with it.
We're not waiting for AI to be perfect. We're constructing the playbook for using it responsibly — brick by brick, use case by use case, with the discipline to know where it belongs and the honesty to admit where it doesn't. Yet.
The Long Game
Two years in, here's what I know.
The organizations that win won't be the ones that adopted fastest. They'll be the ones that built the discipline to adopt well — that treated AI not as a product to buy, but as a capability to govern.
That's the job right now. Not to have all the answers. To build the system that finds them — and to do it knowing the system itself will need to change, probably sooner than you'd like.
That's not chaos. That's leadership.
And keep a watchful eye out for the goblins. They have a way of showing up where you least expect them — and not always in forms you'll recognize.
— Stephen Schrump, CEO, PitchPoint Solutions
Ready to Transform Your Verification Process?
See how industry leaders are streamlining verification with PitchPoint.
Continue Reading
More insights you might find valuable
100X Your Business — or Watch Someone Else Do It
Cloudflare cut 1,100 people while growing 30%. ClickUp cut 22% then introduced $1M salary bands. Neither is a layoff story—it's an alpha signal about how the value of a role is being redefined.

The Recapture Gap: What Rocket's 54% Tells You About Your AI Strategy
Rocket just recaptured 54% of its refi volume from its own servicing book—nearly 3x the industry average—in a brutal rate environment. That's not a market outcome. It's a technology outcome, and it has a P&L attached to it.
Stephen Schrump
May 12, 2026
Your AI Feed Is Lying to You
If your impression of AI in 2026 is mostly deepfakes, job losses, and bubble talk, your feed has a curation problem—not an information problem. Researchers found 91.2% of AI headlines actually carry positive sentiment. Here's what the data says.