80% of Executives Report No Measurable AI ROI. I’m Not Surprised — And I Know Exactly Why.

It represents executives who bought AI tools and watched nothing change. Who ran pilots that produced interesting demos.

“A National Bureau of Economic Research study of nearly 6,000 executives found that more than 80% have seen no measurable impact from AI on either employment or productivity over the past three years. This study will be used by AI sceptics as evidence that the whole thing is overhyped. They are wrong about the conclusion. They are right that something is broken. I want to tell you exactly what it is.”

Let me start with what that 80% number actually represents.

It represents executives who bought AI tools and watched nothing change. Who ran pilots that produced interesting demos. Who told their boards they were “leaning into AI” while their operations continued to run exactly as before. Who gave their teams access to ChatGPT and called it an AI strategy.

That is not evidence that AI does not work. That is evidence that access to AI tools is not the same thing as AI transformation. I have been saying this for two years. I am glad someone finally quantified it at scale.

The Uncomfortable Truth the Study Is Really Revealing

A National Bureau of Economic Research study of nearly 6,000 executives found that more than 80% have seen no measurable impact from AI on either employment or productivity over the past three years. When this finding circulates, the natural response from the AI industry is to say the technology is still maturing, or that measurement frameworks are lagging, or that the gains are real but difficult to quantify. All of those things can be simultaneously true and also a deflection from the simpler explanation.

The simpler explanation: most organisations did not change anything structural about how they operate. They added AI tools to existing workflows instead of redesigning workflows around AI capability. They measured tool adoption instead of business outcome. And they treated the purchase of a subscription as equivalent to the deployment of a system.

A surgeon who is given a better scalpel but performs the same operation the same way gets a marginally better result. A surgical team that redesigns the procedure around the capabilities of the new instrument gets a transformatively better result. The organisations in that NBER study bought better scalpels. Almost none of them redesigned the operation.

Why the 20% Who Did See Results Are Different

The same study that reports 80% seeing no impact implies that 20% did see measurable impact. That 20% is the interesting number, and nobody is talking about it.

I have worked with businesses on both sides of this divide. The pattern is consistent with what McKinsey’s State of AI research also confirms: the difference is not budget, not tool selection, not technical sophistication. The difference is whether an organisation started from a workflow question or a tool question.

The 80% started from: “What AI tools should we buy?” They evaluated options, purchased subscriptions, rolled out access, and waited for results.

The 20% started from: “What specific outcome do we want to change, and what would it take to change it?” They mapped the current state of a workflow, identified the precise points where AI could change the output, built a system that changed those points, and measured the outcome they had named before they started building.

Those are not the same question. They produce fundamentally different results. And the gap between them is not a technology gap. It is a thinking gap.

The Three Failure Patterns I See Most Often

Failure Pattern One: The Demo That Became the Product

I have lost count of how many organisations I have talked to that ran an impressive AI pilot, presented it to leadership, received enthusiastic approval, and then continued running the pilot indefinitely. The demo became the proof of concept. The proof of concept became the product. Nobody built a production system. Nobody measured business outcomes. Nobody connected the impressive demo to an actual workflow that ran reliably every day.

A demo is not a deployment. A pilot is not a system. This is the same point made in Harvard Business Review’s analysis of why AI projects fail: the gap between impressive proof-of-concept and production-ready system is almost always larger than organisations expect, because the pilot was optimised for the demo, not for the daily reality it needs to handle.

Failure Pattern Two: Measuring the Wrong Thing

The most common measurement failure I encounter: organisations counting how many people are using AI tools instead of counting what changed in the business because of it. Usage metrics are not outcome metrics. Fifty people using ChatGPT every day and reporting they find it useful is not the same as 50 people producing measurable output faster, with fewer errors, at lower cost.

This is exactly why I introduced the automation ratio metric in an earlier piece: the percentage of AI-assisted outputs that ship without human correction. That is a number you can measure. It tells you whether your AI is producing reliable output or just producing a first draft that someone still has to fix. The organisations in the 20% track something like this. The organisations in the 80% track seats and logins.

Failure Pattern Three: Automating Without Changing the Process

The most expensive mistake: adding AI to a broken process. As I argued in ‘You Don’t Have an AI Problem, You Have a Systems Problem’: automating a broken process makes it break faster. The NBER data suggests this is happening at scale. Organisations that layered AI onto existing workflows without examining whether those workflows were well-designed got AI-speed execution of mediocre processes. The productivity gain from a mediocre process running faster is small. The frustration from it still producing poor output is large. Neither registers as a win in a productivity measurement.

What the WEF Data Adds to This Picture

The World Economic Forum’s Summer Davos in Dalian this week put the same tension in a global employment context. Their analysis: 40% of global employment is exposed to AI-driven change. An estimated 262 million young people worldwide are not in employment, education, or training. And current projections suggest only 400 million new jobs will be created for 1.2 billion young people reaching working age in the next 15 years.

The WEF’s argument, one I find more honest than most: AI tools do not create livelihoods by themselves. They only create value when people deploy them to build businesses, solve real problems, and open new markets. The technology is necessary but not sufficient. The human decision to build something useful with it is what actually produces economic value.

That is not an argument against AI. It is an argument against passive AI adoption, which is exactly what the NBER study is measuring the failure of. The organisations that reported no productivity impact were passively adopting. The ones that reported impact were actively building.

The Connection to the 74% Rollback Rate

This NBER finding connects directly to the GSPANN research on AI agent rollbacks published this week: 74% of enterprise AI agent deployments get rolled back, almost always because of missing governance architecture, not because the technology failed. The organisations in the 80% no-impact group and the 74% rollback group share the same underlying failure: they started with tools instead of outcomes, and they deployed without the discipline that production systems require.

The Gartner forecast of $206.5 billion in AI agent spending for 2026 — up 139% in a single year — is being driven by organisations moving from experimentation to procurement. Some of those organisations will be in the 20% that sees measurable impact. Most, if the NBER data is any guide, will join the 80% unless something changes about how they approach deployment.

What changes it is exactly what I outlined in ‘The 74% That Got Rolled Back’: define rollback criteria, build observability in, scope agents narrowly, map dependencies, and establish audit trails before launch. That discipline is not a technology capability. It is a deployment discipline. It is available to any organisation today.

The Distinction I Make With Every New Client

When I start working with a new client at Hexona Systems, I ask one question before we discuss any tool, model, or workflow: what is the specific number you want to change, and by how much, and by when?

Most of the time, the answer is vague. “We want to be more efficient.” “We want to save time.” “We want to reduce manual work.” Those are not targets. Those are sentiments. You cannot build a system toward a sentiment. You build a system toward a number.

The clients I work with who have clear numbers — our lead response time needs to drop from 4 hours to 15 minutes, our weekly report compilation needs to go from 6 hours to 30 minutes, our support resolution rate needs to improve from 61% to 80% on first contact — those are the clients who end up in the 20%. Not because they are more sophisticated or better funded. Because they know what they are trying to change.

The clients who come in with sentiments and no targets are the clients who, six months later, are still running pilots. Still producing demos. Still unable to answer whether AI is working for them. They will show up in the next NBER study as part of the 80%.

The Honest State of AI ROI in 2026

I want to be direct about something I do not always say loudly enough. The NBER study is measuring a real failure. Not a technology failure. A deployment and strategy failure. And that failure is widespread enough to show up at 80% in a survey of 6,000 executives. That should concern everyone building AI automation businesses, including me.

The spending is real. The returns are concentrated. The gap between the two is filled by organisations that bought tools and called it strategy.

I think this gap closes over the next two to three years, as usage-based AI pricing (which makes wasted AI spend visible on an invoice), governance requirements (which force organisations to document what their AI is actually doing), and maturing evaluation frameworks collectively force the 80% to get more deliberate. The Druid AI benchmark on containment rates — showing 80 to 99.5% containment for AI agents resolving service interactions end-to-end in organisations with proper deployment discipline — demonstrates what is possible. Those results come from the same discipline the 80% are skipping.

Four Questions to Ask Before Your Next AI Investment

What specific number are we trying to change?

Not ‘be more efficient.’ Not ‘save time.’ A number. Response time. Resolution rate. Cost per acquisition. Weekly hours on a specific task. If you cannot name the number, you are not ready to invest.

What does the current workflow look like step by step?

Before any AI touches it. Written down. With specific inputs, specific decisions, specific outputs. If you cannot map the current workflow, you cannot redesign it around AI capability. The five-step no-code automation workflow guide covers exactly how to do this mapping before touching any tool.

How will we know in 60 days whether it is working?

Not ‘the team seems to be using it more.’ Not ‘the demo looked impressive.’ A specific measurement, at a specific point in time, against the specific number from question one. If you cannot answer this, you will not be able to measure impact and you will end up in the 80%.

Who owns the outcome?

Not who owns the tool licence. Who is accountable for the number changing. AI implementations without a named human owner of the business outcome being targeted drift into maintenance mode without ever having produced the result they were deployed for. Name the person. Give them authority. Hold them to the number.

The Bottom Line

The NBER study showing 80% of executives seeing no measurable AI impact is the most important data point about AI adoption published this year. Not because it proves AI is overhyped, which it does not. Because it proves that tool adoption without workflow redesign, without clear targets, and without measurement discipline does not produce results.

The 20% who did see impact did not have better tools. They had a clearer answer to the question: what specific number are we trying to change? As I argued in ‘Stop Chasing the Biggest Model’, and in Satya Nadella’s own viral essay about building learning loops rather than renting model access, the competitive advantage in AI is not in which tool you buy. It is in how deliberately you build with it.

That question is available to you today. The answer is what separates the businesses compounding with AI from the ones treading water with better tools. Start there. Everything else follows.

Frequently Asked Questions

Does the NBER study mean AI is overhyped?

No. The NBER study measures whether organisations have seen measurable productivity or employment impact, not whether AI is capable of producing such impact. The 20% that did see measurable impact demonstrates the technology works when deployed with clear targets and proper workflow redesign. The 80% reflect a deployment and strategy failure, not a technology failure.

What is the difference between AI tool adoption and AI transformation?

Tool adoption is giving your team access to AI tools and measuring how often they use them. AI transformation is redesigning specific workflows around AI capability and measuring the business outcomes that change as a result. Tool adoption with no workflow redesign produces marginal improvements. Workflow redesign with AI as a core component produces structural efficiency gains. The NBER study is measuring the results of the former.

How should I measure whether AI is working in my business?

Start with a specific, pre-defined business metric: response time, resolution rate, cost per unit, weekly hours on a specific task. Measure it before AI deployment. Measure it again at 30 and 60 days. Track the automation ratio — the percentage of AI-assisted outputs that ship without human correction — as a leading indicator. If the business metric has not moved and the automation ratio is below 60%, the workflow design or context richness needs to change, not the model.

Why do most AI pilots fail to become production systems?

Pilots are designed to demonstrate capability, not to run reliably at volume. They use the best available inputs, the most carefully crafted prompts, and the closest human oversight. Production systems face messy real-world inputs, edge cases, and variable data quality. The gap between impressive pilot and reliable production system is almost always larger than organisations expect, because the pilot was optimised for the demo, not for daily reality.

Where should I start if I want to be in the 20%?

Name one specific workflow, one measurable outcome, and one accountable person. Then read the five-step no-code automation guide to understand how to map the workflow before touching any tool. Build the simplest version that changes the number, measure it for 60 days, then iterate. If you want hands-on guidance, get in touch with the Hexona Systems team — most automations can be scoped and built within a week.

About

Hamza Baig is the founder of Hexona Systems—an automation agency and softwareplatform that helps thousands of entrepreneurs and business owners implement AI-powered workflows at scale.