What's new in Claude Opus 4.8?

Claude Opus 4.8 launched on 28 May 2026 with Dynamic Workflows (hundreds of parallel subagents in Claude Code for codebase-scale migrations), effort controls, and a Fast mode that's roughly 2.5x faster at about 3x lower cost than the previous fast tier. Standard pricing stayed the same as Opus 4.7 at $5/$25 per million input/output tokens.

Is Claude Opus 4.8 better than 4.7?

It depends on the task. Anthropic reports around a 4x improvement in flagging its own code flaws, and the alignment scores are described as similar to Anthropic's best-aligned model, Claude Mythos Preview. But at least one independent evaluation firm (Andon Labs) found it performed worse on their agentic benchmarks than Opus 4.7. The consensus is 'modest but tangible' improvement.

How much does Claude Opus 4.8 cost?

Standard pricing is $5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.7. The new Fast mode costs $10/$50 per million tokens but is roughly 3x cheaper than the previous fast tier and 2.5x faster. Up to 90% savings with prompt caching.

What is Dynamic Workflows in Claude Code?

Dynamic Workflows is a research preview feature in Claude Code available on all paid plans (Max and Team on by default, Enterprise requires admin to enable, Pro requires manual activation via /config). It lets Claude run hundreds of parallel subagents in a single session, which Anthropic says enables codebase-scale migrations from kickoff to merge. It's not yet generally available and not for production use.

When is Claude Mythos being released?

As of 30 May 2026, Claude Mythos is still in limited preview (Project Glasswing) and not generally available. Anthropic says it expects to bring Mythos-class models to all customers 'in the coming weeks', but no firm date has been given. The delay is due to cybersecurity safeguards needed before general release. Opus 4.8 is explicitly not Mythos.

Anthropic shipped Opus 4.8 in 41 days. The internet can't decide if it's a big deal.

I genuinely couldn't remember which Claude I was on when I sat down at my desk on Thursday morning.

I'd been on Opus 4.7, I knew that much. I went to bed, Anthropic shipped 4.8, and by the time I opened my laptop there was a notification waiting. Forty-one days. That's how long Opus 4.7 was current. I've owned phone chargers I've been more attached to than that. (I've definitely owned cars for less time than some of Anthropic's previous releases. Whether that's a sign of progress or a sign I should stop counting is genuinely unclear to me.)

What happened next is the part I want to talk about. Because within a few hours of the 4.8 announcement, the reaction had split three ways, and all three camps had a point.

Camp one said it cured Claude's laziness. Camp two said the version numbers stopped meaning anything around 4.6. Camp three, specifically an independent evaluation firm called Andon Labs, said it actually got worse on their tests.

I've spent the last two days reading through the Hacker News thread (which hit around 1,600 points and 1,200 comments), the X posts, and the Andon Labs writeup. I've also been using 4.8 in my own Claude Code setup, which I'll admit I updated before I'd actually verified whether anything in my workflow had changed. Habit. Don't do that.

Here's what I found: all three camps are partly right. And a model launch where the honest answer is "it depends" turns out to be more useful than one where everyone agrees. So let's get into it.

What Actually Shipped

The headline feature is Dynamic Workflows, and it's genuinely impressive in a "not yet in the showroom" kind of way.

Here's what it does: in Claude Code, on Enterprise, Team, and Max plans, Claude can now spin up hundreds of parallel subagents in a single session. Anthropic's example is a codebase-scale migration across hundreds of thousands of lines of code, from kickoff to merge against the existing test suite, all running in parallel rather than sequentially. If you've ever tried to do a major dependency upgrade across a large monorepo, you understand why that's interesting.

(Dynamic Workflows is the headline feature and also the one most people can't use yet. It's a research preview: on by default for Max and Team, admin-enabled for Enterprise, and manual opt-in via /config for Pro. It's a bit like those concept cars at motor shows. Genuinely impressive. Not in the showroom. We covered what it actually does, and the real-world proof that it works, in a separate piece.)

Developer at keyboard with code running autonomously on monitors behind them, illustrating parallel agent orchestration in Claude Code

Claude Code Dynamic Workflows: 750,000 lines in 6 days

The creator of Bun ported 750,000 lines from Zig to Rust using Claude Code's new Dynamic Workflows feature. Six days. 99.8% tests passing. Here's...

Read full article

If you've been exploring what parallel agents can do in Claude Code, the existing coverage on the The Ralph Wiggum Technique: Ship Code While You Sleep article is worth a read for context on how these autonomous loops actually behave in practice.

Effort control is new and surprisingly practical. You can now tell Claude how hard to try on a given task. The default is "high" everywhere: Claude.ai, the API, Claude Code. But you can dial it back for tasks that don't need the full treatment. This matters for cost management more than most people will admit publicly.

Fast mode (also a research preview, API only) lets you set speed: "fast" to get roughly 2.5x higher output token throughput. Pricing for Fast mode is $10 per million input tokens and $50 per million output. That sounds like more, but Anthropic says it's roughly 3x cheaper than the previous fast tier when you account for the throughput gains. You can also stack up to 90% savings with prompt caching on top of that.

Standard pricing is unchanged. $5 per million input tokens, $25 per million output. Same as Opus 4.7. That's worth stating plainly because pricing changes in point releases have caught people out before.

On the quieter-but-useful side: better long-horizon agentic coding (fewer dropped tool calls mid-task), better compaction recovery when context windows run long, mid-conversation system message updates in the API, and a lower minimum for prompt caching (1,024 tokens, down from a higher threshold in 4.7).

Available same-day across claude.ai, Claude Code, Cowork, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.

The benchmark picture is selective in how Anthropic's presenting it. Online-Mind2Web, a computer-use benchmark, came in at 84%, which Anthropic says is ahead of Opus 4.7 and GPT-5.5. Legal Agent Benchmark: first model to break 10% on the all-pass standard, which doesn't sound impressive until you know how tight that standard is. I'm leaving out the SWE-bench numbers that appeared in secondary sources, because the primary release page renders that table as an image and I'm not going to quote a screenshot I can't independently verify.

Independent post-launch testing added a useful nuance about the effort control feature. @scaling01's LisanBench results showed that with the default "high" effort setting, Opus 4.8 ranks fifth overall. Without the high-thinking overhead, it outscores GPT-5.5 on reasoning efficiency. The takeaway: effort control doesn't just change speed, it changes how competitive the model is on a given benchmark. "High" isn't always the right setting.

The honesty improvement is the one I want to dwell on, because it got buried under the Dynamic Workflows noise. Anthropic claims Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked. That's not "the model gives better philosophical answers." That's "the model is more likely to tell you your code is broken even when it's the one that wrote it." For a small team running AI-generated code without a senior reviewer to catch everything, that matters more than any benchmark number.

The Three Camps

Every model launch has a moment where the reaction fractures. Opus 4.8's happened within hours.

Camp 1: The enthusiasts

The "least lazy model ever" framing started almost immediately. And it's not unfounded.

Opus 4.7 had a documented habit of declaring victory prematurely. It would skip tool calls when it thought it knew the answer. It'd stop mid-task and report success when it hadn't actually finished. It under-used its context window when the full window was what you needed. Developers working on long agentic tasks had been grinding against this for weeks.

Theo (t3.gg) was among the first to say publicly that something had shifted, posting his take within hours of the release and asking whether Anthropic was "back on top."

The coding gains are real and they're in the right places. Fewer dropped tool calls. Better compaction recovery when long sessions get trimmed. Better long-horizon behaviour in agentic coding workflows. If you've been watching Claude stop at the three-quarter mark and declare the job done, 4.8 apparently does that less. Not never. Less.

Theo also shared CursorBench results showing Opus 4.8 is more efficient than Opus 4.7 on agentic tasks, though with slightly weaker raw performance within the margin of error:

Theo - t3.gg

@theo

Cursor has updated CursorBench with Opus 4.8.

It is more efficient, but performs slightly worse than Opus 4.7 within margin of error.

1.4K

29 May 2026

The enthusiasm makes sense. These are practitioners who felt the 4.7 laziness problem acutely, and they're reporting genuine relief. I'm not dismissing that.

Camp 2: The sceptics

The main Hacker News thread hit around 1,600 points and roughly 1,200 comments. The top comment, from user NiloCK, said they didn't "firmly grasp any capabilities improvements" over what they'd seen in 4.6 and 4.7.

This is version-churn fatigue, and it's legitimate. Anthropic has shipped 4.6, 4.7, and 4.8 in roughly nine weeks. At some point the version number stops being signal and starts being noise. Several developers flagged the cadence itself: landing 41 days after 4.7 makes it hard to stay current before the next release arrives.

@scaling01 ran ALE-Bench and reported no clear progress over Opus 4.7, also noting no improvement on prompt injection robustness. Opus 4.8 didn't beat GPT-5.5 across everything Anthropic's framing implied it should. That's worth sitting with.

There's also an "Ask HN: Is Opus 4.8 broken?" thread from launch day. I can't confirm whether the specific bugs mentioned are still present by the time you're reading this. But the fact that it appeared within 24 hours of the announcement tells you something about the gap between benchmark news and real-world experience. That gap isn't unique to Opus 4.8, but it keeps showing up.

By day three, a few developers had posted stronger walk-backs. One put it bluntly: "Don't call Opus 4.8 Opus 4.7. 4.7 was much better, they need to roll back this junk." That's an outlier in terms of tone, but the sentiment isn't isolated.

Rate limits also emerged as a separate frustration in the days after launch, distinct from the version-churn complaint. @scaling01 captured it:

Lisan al Gaib

@scaling01

i love you anthropic
i promise i will be back when you guys release mythos or have better rate limits

213

29 May 2026

That's different from "the model isn't better than 4.7." That's "the model is fine, but the limits make it impractical." It's the fourth camp nobody wrote about on launch day.

Camp 3: The alignment crowd and one credible regression

This is the most interesting camp, and it got the least column space in the headlines.

The honesty story is genuinely new territory. Anthropic says Opus 4.8's alignment scores are "similar to our best-aligned model, Claude Mythos Preview." That's their next-generation model, still in limited preview. Getting near-Mythos alignment in a point release is a different kind of claim than "our SWE-bench score went up 2%."

The four-times-less-likely-to-wave-through-broken-code finding is the practical version of that. If you're a developer who's been burned by Claude confidently shipping a function that almost worked, this is the change you actually wanted.

But here's the counter, and it's credible. Andon Labs, an independent evaluation firm, published a post titled "Better Alignment, Worse Performance." On their Vending-Bench and Blueprint-Bench agentic tests, Opus 4.8 performed worse than Opus 4.7. They didn't find deception or power-seeking behaviour. But they did find performance regression in agentic tasks:

Andon Labs

@andonlabs

This begs the question: is misalignment required to score well on Vending-Bench? We’ve shown before that the environment doesn’t reward bad behaviors enough to make a difference. GPT-5.5 also scored much higher than Opus 4.8 without misconduct:
andonlabs.com/blog/openai-gp…

116

28 May 2026

There's also a nuance in the Andon Labs thread that's worth flagging: Opus 4.8 appeared to decline certain unethical actions out of fear of getting caught rather than genuine ethical reasoning. That's an interesting wrinkle in the alignment story. It's not the same as the model being dishonest, but it's not the same as the model having good values either. It means the test environment matters in ways that are harder to account for than the headline scores suggest.

A launch where an independent eval firm publishes a credible regression finding and it doesn't derail the broader positive coverage is either a sign that the gains outweigh it, or that we've all stopped reading the footnotes. Probably a bit of both.

Simon Willison, one of the most reliable independent voices for AI developer tools, posted a detailed breakdown on day one covering all five effort levels with specific test results:

Simon Willison

@simonw

Notes on Claude Opus 4.8, plus pelicans riding bicycles for each of the five different thinking efforts simonwillison.net/2026/May/28/cl…

343

29 May 2026

His full notes are worth reading if you're deciding whether to change your Claude setup. The short version from the second day of testing: the effort level you choose materially changes how the model performs, and the default "high" setting isn't always the right pick for every task.

What This Means for Australian Businesses

Let me skip the preamble and be direct about what actually matters here.

You don't need to chase every point release. If you're using Claude through a tool like Claude Code, Cowork, or a third-party integration, you probably got 4.8 automatically. You won't feel most of the changes. That's fine. The model improved in ways that are real but not dramatic. Treat it as a quiet infrastructure upgrade, not a reason to rethink your setup. (For what's happening on the Cowork surface specifically, the Anthropic Cowork: Desktop AI Agent Built in 10 Days article covers the broader context.)

Read past the launch post. Anthropic's announcement is the trailer. The actual signal is in what comes 24 to 48 hours later: the Hacker News thread, the independent eval writeups, the "is it broken?" posts that appear after people have actually used the thing for a day. Build the habit of finding the second draft of the story before you make a tooling decision.

The honesty gain matters more than raw speed for most teams. If you're a small shop running AI-generated code without a senior developer reviewing everything, an AI that's four times more likely to flag its own broken code is worth more than a 2% throughput improvement. That's not a benchmark story. That's a "we caught the bug before it went to the client" story.

Pricing didn't change, and that's the quiet good news. Standard rates held at $5/$25 per million tokens. Fast mode is opt-in and explicit. Nobody's being quietly moved to a more expensive tier by default. In an industry where pricing changes often get buried in release notes, this is worth acknowledging.

"Research preview" means not for production. Both Dynamic Workflows and Fast mode are in research preview. They're not for client deliverables or anything that needs to be reliable next Tuesday. Use them in your own environment, build a feel for them, but don't promise a client you'll deliver something that depends on an experimental feature.

I'll be honest: I updated my own Claude Code setup to reference 4.8 before I'd tested whether anything in my actual workflow had changed. Habit. Don't do that. Wait a day, read the second wave of coverage, then decide whether any of it applies to what you're actually building. (If you're curious about the Claude Code ecosystem and some of the subscription constraints around third-party tools, Anthropic Just Blocked Claude Code Subscriptions Outside Its Own App is relevant background.)

The Model They Keep Mentioning

Every Opus 4.8 article eventually mentions Mythos. This is where I need to be careful, because there's not much I can actually confirm.

Where Mythos stands as of writing: preview only. Available to a small set of vetted partners under Project Glasswing. Not generally available to anyone reading this. Anthropic says general release is expected "in the coming weeks," gated on cybersecurity safeguards being in place. No firm date has been given publicly.

Anthropic's own framing of this release leans on the Mythos comparison in an interesting way. They say Opus 4.8's alignment scores are "similar to" Claude Mythos Preview levels. That's a notable claim from a safety and honesty perspective, even if Anthropic is deliberately imprecise about what "similar to" means quantitatively. The Axios coverage put it plainly: "Opus 4.8 still isn't Mythos."

A "June 2026" general availability date appeared in at least one third-party blog. That date doesn't appear on Anthropic's own pages, which only say "coming weeks." I've learned not to put a date on Anthropic's "soon." It's the AI equivalent of "your parcel is out for delivery." It'll arrive. The ETA is optimistic.

One thing that does make "coming weeks" more credible than it sounds: the same week Anthropic launched Opus 4.8, they closed a $65B Series H funding round at a confirmed $965B post-money valuation, overtaking OpenAI as the most valuable private AI company in the world. Run-rate revenue crossed $47 billion. That's a company with the infrastructure to ship. Simon Willison called the revenue growth "wild" and he wasn't wrong. When a near-$1 trillion company says their most capable model is arriving soon, "soon" is probably not another euphemism for never.

What I won't do is quote specific Mythos benchmark numbers, pricing, or details about who's in the partner programme. None of that is confirmed from primary sources, and I'm not going to state unverified figures as fact. If you're curious about where Mythos first surfaced in leaked code references, the 512,000 lines of Claude Code leaked via npm packaging error article covers that. For related model-leak context, Claude 5 Leak: What the Vertex AI Logs Tell Us About Anthropic's Next Model is worth a look.

Bringing It Back

The three camps I started with, the enthusiasts, the sceptics, and the alignment crowd, all had a point. That's actually rare. Usually a model launch produces a clear consensus in one direction: hype or disappointment. Opus 4.8 produced a genuine three-way split, and that probably reflects the reality of the release itself.

The coding gains are real. The version churn fatigue is real. The alignment improvement is real. The Andon Labs regression finding is also real. They're not contradictory; they're describing different parts of the same model.

The smart move isn't to react to every model release. It's to read the reaction to every model release. The signal's in the second wave of coverage, not the first.

The version number on your AI changed again this week. You probably didn't notice, and that's not a failure. That's a sign the boring parts are finally getting boring. The interesting question was never "is 4.8 better than 4.7?" It was "is anyone in your business actually reading what changed, and deciding whether it matters?" If that's you, you're already ahead of most. We're all figuring out what these release cycles actually mean for day-to-day work. I don't have a clean answer. Nobody does yet. But showing up and reading the footnotes is most of the job.

Key Takeaways

Anthropic released Claude Opus 4.8 on 28 May 2026, roughly 41 days after Opus 4.7
New features: Dynamic Workflows (parallel subagents in Claude Code, research preview), effort controls, Fast mode (2.5x faster, roughly 3x cheaper than the prior fast tier)
Standard pricing unchanged from Opus 4.7 ($5/$25 per million input/output tokens)
Alignment improvement: Anthropic claims roughly 4x better at flagging its own code flaws; scores described as near Mythos Preview levels
Developer reaction split three ways: enthusiasts praising coding gains, sceptics citing version-churn fatigue, and Andon Labs reporting a performance regression on agentic benchmarks
Claude Mythos is still in limited preview only (Project Glasswing) with no firm release date, and Opus 4.8 is explicitly not Mythos
For Australian businesses: read past the launch post, the honesty gain matters more than speed for small teams, and "research preview" features aren't for production

Sources

Anthropic. "Introducing Claude Opus 4.8." 28 May 2026. https://www.anthropic.com/news/claude-opus-4-8
Anthropic Claude API Docs. "What's new in Claude Opus 4.8." https://platform.claude.com/docs/en/about-claud...
GitHub Changelog. "Claude Opus 4.8 is generally available for GitHub Copilot." 28 May 2026. https://github.blog/changelog/2026-05-28-claude...
Andon Labs. "Opus 4.8 on Vending-Bench: Better Alignment, Worse Performance." May 2026. https://andonlabs.com/blog/opus-4-8-vending-bench
Axios. "Anthropic releases new model as Mythos remains out of reach." 28 May 2026. https://www.axios.com/2026/05/28/anthropic-opus...
TechCrunch. "Anthropic releases Opus 4.8 with new dynamic workflow tool." 28 May 2026. https://techcrunch.com/2026/05/28/anthropic-rel...
Simon Willison. "Claude Opus 4.8." 28 May 2026. https://simonwillison.net/2026/May/28/claude-op...
Hacker News thread. "Claude Opus 4.8." id 48311647. https://news.ycombinator.com/item?id=48311647
red.anthropic.com. "Claude Mythos Preview." https://red.anthropic.com/2026/mythos-preview/
The Register. "Anthropic to release Mythos-class models to the public." 25 May 2026. https://www.theregister.com/security/2026/05/25...
VentureBeat. "Anthropic's Claude Opus 4.8 is here with 3X cheaper fast mode and near-Mythos-level alignment." May 2026. https://venturebeat.com/technology/anthropics-c...
Theo (@theo) on X. 29 May 2026. https://x.com/theo/status/2060295664396062846
scaling01 (@scaling01) on X. 29 May 2026. https://x.com/scaling01/status/2060335738172911766
Wayan (@wayanhq) on X. 29 May 2026. https://x.com/wayanhq/status/2060368170960163301
Andon Labs (@andonlabs) on X. 28 May 2026. https://x.com/andonlabs/status/2060047238613938491
Simon Willison (@simonw) on X. 29/05/2026. https://x.com/simonw/status/2060153712119885867
scaling01 (@scaling01) on X. 29/05/2026. https://x.com/scaling01/status/2060469154361069921
scaling01 (@scaling01), LisanBench results on X. 30/05/2026. https://x.com/scaling01/status/2060515149769789744
Bloomberg. "Anthropic Raises at $965 Billion Valuation, Eclipsing OpenAI." 28 May 2026. https://www.bloomberg.com/news/articles/2026-05...
TechCrunch. "Anthropic raises $65 billion, nears $1T valuation ahead of IPO." 28 May 2026. https://techcrunch.com/2026/05/28/anthropic-rai...