AI Agents Are Getting Good at Code — But the Real Opportunity Hasn't Started Yet

A major new report from Anthropic just confirmed something I've been watching closely for months — and it changes how every forward-thinking operator should be thinking about AI agents right now.

The data is based on millions of real interactions between humans and AI agents across Claude Code and public API environments. It's the most grounded picture we've had of how AI agents are actually being used in the wild. And the findings tell two very different stories depending on where you sit in the market.

The Headline: Agents Work — But Mostly for Developers

Let's start with what the numbers show.

Software development accounts for nearly half of all agent-based activity. AI agents are being used to write code, run it, test it, and increasingly to work through complex development tasks with minimal human input along the way.

The autonomy numbers are striking. Among the longest working sessions tracked, the amount of time an agent worked independently without human intervention almost doubled in just three months — from under 25 minutes to over 45 minutes. That's not incremental progress. That's a fundamental shift in how much independent work these systems can handle.

Outside of programming, though, adoption remains narrow. Early applications are appearing in healthcare, finance, and cybersecurity, but they are still small in scale and deliberately cautious in scope — focused almost entirely on tasks that are reversible, low-risk, and easy to check.

Why Programming Got There First

The Environment Is Forgiving

Code is one of the best possible environments for an AI agent to operate in, and not by accident. When a model writes a line of code that doesn't work, the error is immediate, measurable, and correctable. You run it, it fails, you fix it. The feedback loop is tight, the mistakes are visible, and nothing catastrophic happens before a human gets a chance to review.

That's exactly the kind of environment where agents thrive. The task is structured. The success criteria are clear. The cost of being wrong is low enough to tolerate while the agent learns.

The Output Is Verifiable

There's another reason agents work well in development: you can test the output. A function either runs or it doesn't. A test either passes or fails. Developers have spent decades building frameworks for verifying whether code does what it's supposed to do, and AI agents slot into those frameworks naturally.

In most other domains — finance, legal, healthcare, customer communication — verification is harder, slower, and more expensive. That's not a permanent barrier, but it's a real one, and it explains why adoption outside programming is still tentative.

The Gap Between Where Agents Are and Where They Need to Go

Low-Risk Tasks Are a Starting Point, Not a Ceiling

The Anthropic researchers are clear that current non-programming use is dominated by tasks that can be undone or corrected. Think document classification. Scheduling. First-draft generation. Data formatting. These are genuinely useful — they save time and reduce manual work — but they represent a fraction of what AI agents are capable of.

The ceiling is much higher. Agents that can autonomously manage client onboarding sequences, reconcile financial workflows, conduct multi-step research, or coordinate across systems and platforms are not science fiction. They're early-stage commercial reality. What's missing isn't the capability. It's the control infrastructure around it.

The Human-AI Interface Needs to Evolve

Anthropic's researchers are direct about what broader adoption requires: new ways of monitoring and controlling the interaction between humans and AI agents. Right now, the tools for overseeing autonomous AI work — understanding what the agent did, why it made the decisions it made, and where it needs course correction — are not mature enough to give organisations the confidence they need to hand over higher-stakes workflows.

This is the bottleneck. And it's the most important thing the industry needs to solve.

What This Means for Operators and Business Leaders Right Now

The Early Mover Advantage Is Real

Here's what I want the people in my community to understand: the fact that AI agent adoption outside of programming is still limited is not a reason to wait. It's a reason to move.

Every major technology shift has a window — a period between "early adopters" and "industry standard" — where the gap in capability between those who are building with the new tools and those who are still watching is at its widest. That window is open right now for AI agents in business operations.

The organisations that are quietly building reversible, well-monitored agent workflows in their operations today are the ones that will have the infrastructure, the experience, and the institutional knowledge to scale when the control tools catch up.

Start Where the Researchers Say Agents Work Best

The Anthropic data gives a clear blueprint for where to start. Focus on tasks that share the properties that made programming such a natural fit for agents: structured inputs, clear success criteria, reversible outcomes, and verifiable results.

In practical terms for most businesses, that means automating client data collection and classification, building agent-assisted workflows for document processing, deploying agents for internal knowledge retrieval and summarisation, and using agent-driven sequences for lead qualification and outreach — tasks where the output can be reviewed before anything consequential happens.

Build trust with the technology at this level before expanding into higher-stakes territory. That's not caution. That's strategy.

The Autonomy Curve Will Keep Moving

The doubling of autonomous session length in three months is not a statistic to gloss over. It tells you something important about the trajectory: the rate at which these systems can handle independent work is accelerating faster than most people expect.

Businesses that understand agent architecture now, that have internal processes designed for human-AI collaboration, and that have already mapped their workflows for automation will absorb these capability improvements immediately. Businesses that are still debating whether to engage will spend that time catching up.

The Monitoring and Control Problem Is an Opportunity in Disguise

It would be easy to read the Anthropic researchers' call for better human-AI oversight frameworks as a warning. I read it as something else: a clear signal of where the next layer of infrastructure value will be built.

The organisations — and the platforms — that solve the visibility and control problem for AI agents will define what enterprise-grade automation looks like for the next decade. Knowing what your agent did, being able to audit its decisions, and having reliable intervention points when something goes off track are not nice-to-haves. They are the conditions that unlock trust — and trust is what unlocks scale.

If you're building automation systems for clients or internal operations, start designing for auditability now. It will be the difference between tools people tolerate and tools people trust.

The Bigger Picture

AI agents are not waiting for permission to become a defining feature of how work gets done. The Anthropic data shows they're already embedded in the workflows of developers at meaningful scale — and the autonomy levels are rising at a pace that makes this year's numbers look conservative by next year's standards.

The question for every business leader and operator is not whether AI agents will become part of how your organisation functions. It's whether you'll be ahead of that shift or scrambling to catch up when it becomes unavoidable.

The window to build intelligently — to start with low-risk, reversible workflows, develop your monitoring instincts, and build agent-ready processes — is open right now.

Use it.

About

Hamza Baig is the founder of Hexona Systems—an automation agency and softwareplatform that helps thousands of entrepreneurs and business owners implement AI-powered workflows at scale.