Why did Claude Code get worse in March 2026?

Anthropic's postmortem confirmed three separate engineering bugs: a reasoning effort downgrade on 4 March (from 'high' to 'medium'), a caching bug on 26 March that wiped session memory every conversation turn, and a verbosity-reduction system prompt shipped on 16 April with Opus 4.7 that dropped coding performance by 3%. All three were resolved by 20 April 2026 in version 2.1.116.

What were the three Claude Code bugs Anthropic admitted in their postmortem?

Bug one: Claude Code users were silently moved from 'high' to 'medium' reasoning effort on 4 March. Bug two: a caching bug wiped session memory every single conversation turn from 26 March, meaning each prompt was effectively the first prompt. Bug three: a new verbosity-reduction system prompt shipped with Opus 4.7 on 16 April, which the postmortem acknowledged caused a 3% coding performance drop and described as 'the wrong tradeoff.'

How can I tell if my AI coding tool has degraded?

Build a small adversarial test suite of your ten most common tasks. Run it weekly and log the scores. This gives you a baseline. When a vendor quietly changes something, your scores will drop before you read any announcement. Without this kind of measurement in place, you're relying entirely on the vendor to tell you when something's off. Anthropic's own postmortem is a good case study in why that's not a reliable strategy.

Is Claude Code reliable for production work?

Claude Code is a capable tool. The issue isn't whether it's capable, it's whether any single AI vendor's tooling is reliable enough to carry mission-critical workflows on its own. The April 2026 incident shows that silent degradations can run for weeks before being acknowledged. The practical takeaway is to measure your own outputs rather than assume consistency, and to avoid building workflows where one vendor's performance change becomes your emergency.

What's the alternative to Claude Code for Australian developers?

Australian developer teams are actively using Cursor, GitHub Copilot, Cody, and Codeium alongside or instead of Claude Code. The more useful question isn't which tool to switch to, it's whether your workflow is resilient enough that any single tool's degradation doesn't derail you. Single-vendor dependency is the actual risk, not the choice of which vendor.

Anthropic told its paying customers Claude Code was fine. It wasn't. For a month.

Sometime in early March I noticed Claude Code wasn't quite doing what it used to do. The suggestions felt shallower. The reasoning chains were shorter. Tasks that used to resolve in a prompt or two were taking four or five. I assumed it was me.

The first week I figured I'd had a run of sloppy prompts. Happens. The second week I started second-guessing the whole structure of how I was asking things. Maybe I'd gotten complacent. Maybe I needed to be more specific. I rewrote prompts that had worked fine in February. I started adding more context, more constraints, more examples. I told our junior devs to do the same.

The third week I started screen-recording sessions. Not for any particular reason. Just to have a record. Something to look back at and figure out where I was going wrong.

I am not joking when I tell you I wrote a 600-word memo to myself titled "why am I bad at this all of a sudden."

It had sections. It had a table of what I'd tried. It had a conclusion that read, roughly, "I think the issue is that I've been doing this long enough to develop bad habits without realising." I sent a version of it to a couple of people at Webcoda. They agreed the prompts could be better. We all quietly adjusted.

On 23 April, Anthropic published an engineering postmortem. It confirmed three separate bugs that had degraded Claude Code's performance between 4 March and 20 April. Not user error. Not prompting habits. Three bugs. All on their end. All happening while the public response to developer complaints was some variation of "we don't see degradation in our metrics."

I felt vindicated. Then I felt irritated. Then I thought about it for a while and wrote this instead.

If your business is paying for an AI tool, the part where the vendor takes you seriously when something feels off is not optional. It's the entire product.

The Three Bugs

The postmortem is worth reading in full. Anthropic published it at anthropic.com/engineering/april-23-postmortem on 23 April 2026. It's honest, it's detailed, and it's the kind of thing that makes you simultaneously respect the company and stay very annoyed at the preceding seven weeks.

Here's what actually happened.

Bug one: The reasoning effort downgrade (4 March)

Claude Code users were silently moved from "high" reasoning effort to "medium." Most developers wouldn't notice this directly because the setting isn't always surfaced in the interface. The output just gets shallower. Reasoning chains get shorter. The model stops working quite as hard on problems that need working hard on.

From "high" to "medium." That's the kind of change you make to your kid's screen time, not your enterprise AI.

The thing about this particular bug is that it's invisible by design. There's no message that says "your reasoning effort has been downgraded." The code suggestions just gradually feel less considered. You start thinking you're writing harder problems. You start adding more scaffolding to your prompts. You write memos to yourself about your declining skill level.

Bug two: The caching bug (26 March)

Three weeks after the first bug, a caching bug wiped session memory every conversation turn. So every prompt was effectively the first prompt. No context retention. The model had no memory of what you'd been working on five minutes earlier.

Imagine your assistant forgetting what you're working on every single time you ask a follow-up question. Now imagine paying $100 a month for that.

This one is brutal for any multi-step development task, which is to say most of them. Debugging a complex problem requires the model to hold a mental model of the codebase, the error, and the attempted fixes. When that resets on every turn, you're not having a conversation with an AI. You're repeatedly introducing yourself to one.

Bug three: The verbosity system prompt (16 April)

Anthropic shipped Claude Opus 4.7 with a new system prompt designed to make responses more concise. That's a reasonable goal. Verbose AI output is genuinely annoying. The problem was that the new instruction cost 3% on coding benchmarks.

The postmortem language on this one is unusually direct: "this was the wrong tradeoff."

Anthropic added an instruction to be less wordy. It made the model 3% worse at coding. I'm genuinely not sure which of those two outcomes is more on-brand for the AI industry in April 2026.

Three bugs. Each one individually forgivable. These are complicated systems. Things go wrong. The part that's a different category of problem is that all three reduced the quality of the same product over the same six weeks, while customers were paying full price, and while the vendor's official response was pointing back at the customers.

All three were resolved by 20 April in version 2.1.116. A month and a half after the first one shipped.

The Communication Failure

The bugs are the what. The communication pattern is the why this matters beyond Anthropic specifically.

In early March, developers started posting on Reddit, Hacker News, and X that Claude Code felt different. Shallower. Slower on complex tasks. Common report: things that took one prompt now took three.

Mid March, Anthropic's responses to these reports were broadly in the territory of "we don't see degradation in our metrics." The overall message being that the problem, if there was one, was probably on the user's end. Which is where the story gets uncomfortable to write about, because there's a specific word for what happens when someone in authority tells you that your clear perception of a thing is incorrect.

The word "gaslighting" comes from a 1944 MGM film, itself based on a 1938 British stage play by Patrick Hamilton. Ingrid Bergman notices the gas lights in their Victorian home flickering and dimming. Her husband, who has been secretly turning them down while searching the house for hidden jewels, tells her she's imagining it. The goal is to make her doubt her own perception to the point of questioning her sanity. The term entered psychology to describe any manipulation where the victim is made to question their own reality rather than the reality being corrected.

Developers noticed the lights flickering. Anthropic said they were imagining it. Developers wrote memos to themselves about why they'd suddenly gotten worse at their jobs.

That parallel is what drove the reaction when the postmortem dropped. Brent W. Peterson's Medium post, titled "Anthropic Breaks Claude and Gaslights Us," went viral in developer circles. It wasn't the most measured piece of writing (it's a Medium post about feeling gaslit, so), but it captured exactly why the community reaction was as heated as it was. The frustration wasn't really about the bugs. It was about being told, for weeks, that the bugs weren't there.

The AMD connection is worth a brief mention, and the timeline matters here. On 6 April (two weeks before the postmortem), Stella Laurenzo, Director of the AI group at AMD, published an analysis of 6,852 Claude Code sessions documenting the performance changes. The Register covered it the same day. An external director at a major hardware company had filed a data-backed bug report showing the degradation was measurable and real. The postmortem came 17 days later. When your tool's degradation is notable enough to show up in a hardware manufacturer's session analysis before you've acknowledged it publicly, "we don't see degradation in our metrics" becomes a hard sentence to stand behind.

Around the same time as the postmortem, developer Matt Shumer (@mattshumer_) put it this way on 24 April:

Matt Shumer

@mattshumer_

i'm a few days late to realizing this but:

wow, opus 4.7 is god awful

like so, so bad

it's making mistakes on things i'd expect gpt-4o to handle cleanly

there's got to be some explanation, right?

1.5K

24 Apr 2026

Brent W. Peterson's Medium post captured what a lot of developers were feeling: the frustration wasn't really about the bugs. It was about being told, for weeks, that the bugs weren't there. That's a fairly specific kind of achievement for a postmortem to have to clean up.

If your monitoring tells you everything's fine while paying customers tell you everything's broken, the question isn't "who's right." The question is "what's broken about your monitoring."

The seven-week gap between first developer reports (early March) and the full postmortem (23 April) is the number worth sitting with. That's not a one-day incident. That's a sustained period where the official answer and the user experience were in direct contradiction.

Worth noting: during this same period, Anthropic briefly tested removing Claude Code from the $20 Pro tier, forcing a $100 Max upgrade. They reversed it within 12 hours after the community reaction. In isolation it's a product decision. In the context of everything else happening in April, it added a layer of friction to an already strained relationship between the company and its paying users.

What This Means for Your AI Tooling Strategy

This isn't an Anthropic-specific problem. Let me be clear about that, because the point isn't to say Anthropic is uniquely bad at this. OpenAI has had similar incidents. Google's tools have had similar incidents. Any vendor building on top of rapidly-evolving AI infrastructure is going to have incidents.

The Claude Code postmortem matters not because Anthropic did something uniquely terrible, but because it's a well-documented example of a pattern that every business using AI tooling is exposed to. Here's what to do about it.

Don't build mission-critical workflows on a single AI vendor.

This doesn't mean you can't use Claude Code, or Cursor, or Copilot. It means you shouldn't architect your delivery process so that one vendor's performance change becomes your emergency. If Claude Code is down or degraded, can your team ship? If the answer is no, you've got a single point of failure dressed up as a productivity tool.

Think about it the same way you'd think about a hosting provider. You'd have a backup plan if your primary went down. AI tooling is now critical enough infrastructure to deserve the same thinking.

Build evaluation into your workflow.

If you can't measure whether your AI tool is performing better or worse than last week, you can't catch silent degradations. This doesn't have to be complicated. Build a small adversarial test suite of your ten most common tasks. Run it weekly. Log the scores. The first time the score drops noticeably, you'll know before the vendor tells you anything.

It's like having a smoke alarm. Most people's reaction when the alarm goes off is to take the battery out. The point is to listen to the alarm. The weekly test suite is the smoke alarm.

The developers who noticed Claude Code's degradation in March 2026 weren't wrong. They were just relying on their perception of quality rather than measured outputs, which meant they couldn't prove it and had to absorb weeks of "try adjusting your prompts" before the postmortem validated what they already knew.

Treat vendor postmortems as a feature, not a freebie.

Anthropic's postmortem here is actually quite good. It's detailed, it names specific dates, it admits "the wrong tradeoff." Not all vendors publish postmortems. Some vendors publish them and leave out the uncomfortable parts. The existence and quality of postmortems is a real signal about how a vendor treats its customers when things go wrong.

Start using this as a vendor selection criterion. Before committing to a tool, look up whether they publish detailed incident reports. If they don't, you're flying blind on their reliability.

Read your contract for performance guarantees.

Most AI vendor contracts have exactly zero performance guarantees. You're paying for access to the API, not for a defined level of output quality. That's worth understanding before you build production workflows on top of it. It's also worth understanding when you're evaluating what accountability looks like if something like this happens to you.

The Opus 4.8 release at the end of May is worth mentioning here. Anthropic explicitly addressed the comment-verbosity and tool-calling issues in the 4.8 release. It's the clean bookend to this story: six weeks of degradation, an admission, and then a model release that addresses the specific issues. That's a reasonable arc. The question is whether the middle part needs to take seven weeks next time.

Split-screen visualisation showing positive and negative developer reactions to a new Claude model release

Anthropic shipped Opus 4.8 in 41 days. The internet can't decide if it's a big deal.

Anthropic's newest model dropped 41 days after the last one. Developers are split three ways: some say it cured Claude's laziness, some say the...

Read full article

The Australian Business Angle

Australian development teams are heavy users of AI coding tools. Cursor, Claude Code, GitHub Copilot, Cody, and Codeium are all in active use across the sector. The tooling is embedded in daily workflows in a way that wasn't true even 18 months ago.

For an SMB-sized agency, the cost of silent vendor degradation isn't just developer frustration. It's slower delivery on client work. It's quality regressions in shipped product that you might not catch until someone else does. It's paying full price for partial value, which is annoying in the abstract and genuinely damaging when you're billing by the project or running tight margins.

The harder-to-measure cost is developers losing trust in their own judgement. That's what the Claude Code incident actually did to a lot of people in March. They noticed something was off, were told they were wrong, and started doubting themselves. For a few weeks, some very competent developers were writing memos about their declining skill level. That's not nothing. Confidence and clarity are part of what you're paying people for.

If you're a Webcoda-sized agency in Sydney and your devs are using Claude Code daily, the question isn't whether to keep using it. The question is whether you'd notice the next time it gets quietly worse. And whether you'd trust your own perception of that, or wait for a postmortem to confirm what you already knew.

There's a broader adjacency to this story worth naming briefly. We've written about Anthropic's operational transparency in the context of the Claude Code source leak earlier this year, which is a related but separate beat in the same broader question of how mature Anthropic's operations are as the company scales.

Abstract digital visualisation of source code flowing out of a broken container, representing the Claude Code npm leak

512,000 lines of Claude Code leaked via npm packaging error

Anthropic shipped their entire Claude Code source in every npm install. A missing config line exposed KAIROS, 44 hidden features, and a stealth mode...

Read full article

What Comes Next

The postmortem was published on 23 April. Opus 4.8 shipped in late May with specific fixes for the verbosity and tool-calling issues. The immediate engineering story has a reasonably clean ending.

The operational story is less resolved. A seven-week gap between user reports and vendor admission is a long time. Particularly when the product in question is a paid, production-grade tool that developers are using to ship real software for real clients.

Vendor trust and vendor capability are separate dimensions. Anthropic's models are excellent. The postmortem itself is excellent, honestly. The seven-week gap between "developers are reporting this" and "yes, we confirm" is what should sit uncomfortably with anyone building on their infrastructure.

I'm still using Claude Code. I'm just running it differently. I'm logging outputs. I'm comparing weeks. I'm not assuming the vendor will tell me when something's off, because the past six weeks of this story says they won't, even when they want to. The product I'm buying isn't just the model. It's the company that ships the model. Both are part of the deal.

Key Takeaways

From 4 March to 20 April 2026, three separate bugs degraded Claude Code's performance while users were told they weren't seeing what they were seeing
Bug one: reasoning effort silently downgraded from "high" to "medium" (4 March)
Bug two: caching bug wiped session memory every conversation turn (26 March)
Bug three: verbosity-reduction system prompt dropped coding performance 3% (16 April), described in the postmortem as "the wrong tradeoff"
All three bugs were fixed in v2.1.116 by 20 April
Anthropic published a detailed postmortem on 23 April
The seven-week gap between first user reports and full vendor admission is the number worth sitting with
Don't build mission-critical workflows on a single AI vendor
Build a test suite and run it weekly so you can catch silent degradations yourself
Treat vendor postmortem quality as a vendor selection criterion
Read your AI vendor contract for performance guarantees (you probably won't find any)

---

Sources

Anthropic Engineering. "April 23 Postmortem." 23 April 2026. https://www.anthropic.com/engineering/april-23-...
Fortune. "Anthropic Engineering Missteps, Claude Code Performance Decline, User Backlash." 24 April 2026. https://fortune.com/2026/04/24/anthropic-engine...
The Register. "Anthropic Says It Has Fixed Claude Code Degradation Issues." 23 April 2026. https://www.theregister.com/2026/04/23/anthropi...
VentureBeat. "Mystery Solved: Anthropic Reveals Changes to Claude's Harnesses and Operating Instructions Likely Caused Degradation." April 2026. https://venturebeat.com/technology/mystery-solv...
Axios. "Anthropic Claude Power User Complaints." 16 April 2026. https://www.axios.com/2026/04/16/anthropic-clau...
Peterson, Brent W. "Anthropic Breaks Claude and Gaslights Us." Medium. April 2026. https://medium.com/@brentwpeterson/anthropic-br...
Shumer, Matt (@mattshumer_). X post, 24 April 2026. https://x.com/mattshumer_/status/20477578362690...
The Register. "Claude Code has become dumber, lazier: AMD director." 6 April 2026. https://www.theregister.com/2026/04/06/anthropi... (Stella Laurenzo, Director of AI group at AMD, analysed 6,852 Claude Code sessions)
Cukor, George (dir.). "Gaslight." MGM, 1944. (Adaptation of Patrick Hamilton's 1938 stage play "Gas Light.")
InfoQ. "Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes." May 2026. https://www.infoq.com/news/2026/05/anthropic-cl...

---