If you've been using Claude Code over the past several weeks and something felt off — your instincts were right. Anthropic just published a post-mortem confirming that three separate issues degraded Claude Code's quality between early March and April 20, 2026.
The problems touched reasoning effort, session memory, and output verbosity. Together, they created what appeared to be broad, inconsistent degradation across the tool.
As someone who spends every day inside AI automation workflows, I want to break down exactly what happened, why it matters to practitioners like us, and what it tells us about building resilient AI-powered systems.
Anthropic changed Claude Code's default reasoning effort from high to medium to address long latency spikes in Opus 4.6. The intent was reasonable — some users were seeing the UI appear frozen during extended thinking sessions. But the tradeoff was the wrong one.
Users immediately noticed Claude felt less intelligent. Anthropic shipped UI changes to make the effort setting more visible, but most people never changed the default. The fix — reverting back to high effort (and actually bumping Opus 4.7 users to xhigh) — only landed on April 7.
What this means for automation builders: Default settings in AI tools are not neutral. They encode tradeoffs that directly affect output quality. If you're building workflows on top of Claude Code or any AI coding assistant, you should explicitly set reasoning parameters rather than trusting defaults — especially after a product update.
This one is the most technically interesting — and the most damaging in practice.
Anthropic introduced a caching optimization designed to reduce latency when resuming idle sessions. The design was clean: after an hour of inactivity, clear old thinking blocks from the session to reduce uncached token costs. Simple enough.
The implementation had a critical bug. Instead of clearing once upon resumption, it cleared thinking history on every subsequent turn for the rest of the session. Once a session crossed the idle threshold, each new request instructed the API to retain only the most recent reasoning block and discard all preceding blocks. Compounding the issue, if a follow-up message arrived mid-tool-use, that started a new turn under the broken flag, stripping even the current reasoning.
The result? Claude kept executing tasks with progressively less memory of why it was doing them. Users reported forgetfulness, repetitive responses, and strange tool choices. On top of that, the repeated cache misses caused usage limits to drain faster than expected.
What this means for automation builders: Long-running or multi-session AI workflows are particularly vulnerable to this class of bug. If your automations involve Claude working through complex, multi-step tasks — especially with idle periods between turns — session context integrity is something you need to actively verify, not assume. Build checkpoints into your workflows. Log reasoning steps externally when the stakes are high.
In preparing for the Opus 4.7 launch, Anthropic added a system prompt instruction to rein in the model's naturally verbose outputs. The specific instruction capped inter-tool text to 25 words and final responses to 100 words.
The prompt passed weeks of internal testing with no regressions. But a broader ablation study later revealed a 3% drop in coding performance for both Opus 4.6 and 4.7. It was reverted on April 20.
What this means for automation builders: System prompt design is not just a prompting exercise — it's an engineering discipline with measurable performance consequences. A single constraint line, even one that seems cosmetically harmless, can have downstream effects on task quality. If you're maintaining system prompts for AI agents in production, treat every change like a code deployment: version it, test it against a benchmark suite, and roll it back if anything regresses.
Anthropic was transparent about this: each issue affected a different slice of traffic on a different timeline, so the aggregate pattern looked like normal variation at first. Internal evals didn't reproduce the problems because two unrelated server-side experiments masked the caching bug in most CLI sessions.
It was ultimately user feedback — specific, reproducible examples shared via the /feedback command and public posts — that gave Anthropic the signal they needed to investigate and confirm the root causes.
This is worth sitting with for a moment. The most sophisticated AI lab in the world, with extensive internal testing infrastructure, needed its power users to surface these issues. That's not a failure of Anthropic's engineering — it's a structural reality of how complex AI systems behave in production at scale.
Anthropic has committed to several process improvements:
They've also created @ClaudeDevs on X as a dedicated channel for explaining product decisions in depth, and will share parallel updates on GitHub.
As a practitioner, I'll be watching closely for:
This incident is a case study in what I call invisible regression — when an AI tool you depend on quietly gets worse, and the degradation is subtle enough that you second-guess yourself before you trust your instincts.
Here's the framework I use, and that I'd encourage you to adopt:
Before you automate anything critical with an AI tool, run a set of standard benchmark tasks and record the outputs. Date-stamp them. Re-run them after every major update. If outputs shift meaningfully, you'll know immediately — rather than spending weeks wondering if you're prompting wrong.
When a new version of a Python package ships, you check the changelog and test before deploying to production. The same discipline applies to AI tool updates. Anthropic changed Claude Code's default reasoning effort, added a system prompt instruction, and shipped a caching optimization — all without the kind of visibility that a proper software release would give you. Stay close to changelogs, community channels, and developer updates like @ClaudeDevs.
Anthropic explicitly credited user reports for helping them identify and resolve these issues. If something feels wrong with an AI tool you rely on, say so — specifically, reproducibly, and publicly if necessary. The feedback loop between practitioners and AI labs is one of the most valuable mechanisms we have for improving these tools. Use it.
Anthropic handled this well. Publishing a detailed post-mortem, resetting usage limits for all subscribers, and committing to concrete process changes is the right response. But the episode is a reminder that even the best AI tools are engineering systems — subject to bugs, tradeoffs, and unintended interactions.
The practitioners who thrive in the AI-first era won't just be the ones who use these tools — they'll be the ones who understand how they work, notice when they break, and build workflows resilient enough to handle the moments when they do.
That's the edge. And it's exactly what we build here.
Hamza Baig is the founder of Hexona Systems—an automation agency and softwareplatform that helps thousands of entrepreneurs and business owners implement AI-powered workflows at scale.