I was going to start this piece with a benchmark. Something like "Claude Code can process X tokens per minute" or "Dynamic Workflows reduced migration time by Y percent in internal testing." That's the kind of number AI announcements usually lean on, and it usually means very little because the conditions are controlled, the tasks are toy problems, and there's no consequence if something goes wrong.
Then I read what Jarred Sumner actually posted, and I put the benchmark draft in the bin.
Jarred built Bun. If you've worked in JavaScript at any point in the last few years, you've almost certainly heard of it: a fast JavaScript runtime that competes with Node.js, written largely in Zig. He's not a hobbyist. He's not a researcher. Bun has real users, real production workloads, and a test suite that has to keep passing or those users notice immediately.
He used Dynamic Workflows, a new Claude Code feature, to port 750,000 lines of that codebase from Zig to Rust. He described it as taking 6 days in his own post. Anthropic's blog puts the full project at 11 days from first commit to merge. I'll come back to the gap between those two numbers, because I think it matters.
Either figure made me stop and re-read the tweet. But the 6-day claim from Jarred himself is the one that cut through.
This isn't an AI benchmark. It's a developer with genuine skin in the game, a production codebase he depends on, and a test suite measuring whether the output actually works. 99.8% tests passing isn't "the AI produced code that looks plausible." It's "the AI produced code that runs." That distinction is everything.
I've been deliberately careful about "AI changes everything" framing. I've been wrong to dismiss things before (I underestimated GitHub Copilot early on, which I'm still quietly embarrassed about). I try not to repeat that pattern. But 750,000 lines in something between 6 and 11 days, from the person who built the thing? I can't brush that aside. So let's look at what actually happened, and what it means for the rest of us.
---
The Bun Story
Zig and Rust are both systems programming languages. They're not like porting a Python script to JavaScript, where a lot of the semantics carry across and the main work is syntax translation. Zig and Rust have fundamentally different approaches to memory management and ownership. You can't search-and-replace your way through 750,000 lines and call it done. Every time ownership changes hands in Rust, the compiler is checking whether you've got it right, and Zig doesn't think about ownership the same way at all. It's a real intellectual problem at scale, not just a mechanical one.
That's the context for why Jarred's result is worth talking about.
What Jarred pointed to wasn't just Dynamic Workflows in isolation. He specifically called out adversarial code review as part of what made it work. Some agents were tasked with finding problems in what other agents produced. That's not a bonus feature. That's architecture. The system doesn't assume its own outputs are correct; it builds in a process for catching mistakes before they compound into the kind of result that fails your test suite.
And then there's this, which is the quote I keep coming back to:
"State of the art today for reliably using agents to complete medium to large projects." From the person who just finished using it on a three-quarters-of-a-million-line codebase. Jarred has no financial relationship with Anthropic that I'm aware of. He's not a company spokesperson. He's just reporting what he found after a project he ran himself.
Now, the numbers. Jarred said 6 days. Anthropic's blog says 11 days from first commit to merge. I don't think these contradict each other. The most likely explanation is that 11 days is the full project timeline, including the setup phases, the initial planning, the days where Dynamic Workflows was ramping up rather than running at full pace. Jarred's 6 days is probably the active Dynamic Workflows portion, the phase where the parallel agents were actually running across the codebase. He was there. I'll use his figure as the headline because he's the primary source, but Anthropic's 11 days is the honest project-level number, and it's worth knowing both.
Klarna used Dynamic Workflows for discovery and review across large codebases, identifying dead code and flagging things that had quietly stopped being necessary. CyberAgent used it to fill the gap between running a single subagent and managing a full agent team. These are more typical business use cases than a 750K-line runtime rewrite, which matters because it means the feature isn't only for extreme scenarios. (A note on the line count: Anthropic's blog describes "roughly 750,000 lines of Rust." The actual PR diff shows over a million lines added in total. Third-party sources broadly cite around 960,000. The 750K figure is Anthropic's characterisation of the AI-generated Rust output specifically.) But Jarred's project is the one that made the case in public, with real numbers, from a real system that people rely on.
(If you want more context on what Claude Code has been capable of for individual developers, [article:google-engineer-claude-code-one-hour-year-project-2026] is worth reading. That piece covered a Google engineer completing a year's worth of work in an hour with Claude Code. The thread of evidence here is getting harder to wave away.)
---
How It Actually Works
Let me try to explain this in plain English, the way I'd explain it to a client who's not a developer.
You give Dynamic Workflows a complex task. Not "write me a function" complex. More like "audit this entire codebase for security vulnerabilities" or "migrate all these files from one framework to another" complex. The kind of task where, if you asked a junior developer to do it, you'd expect it to take weeks.
Instead of Claude working through it sequentially (read file, process file, move to next file, repeat for ten thousand files), it writes an orchestration script first. That script fans the work out across parallel subagents. Up to 16 run simultaneously, with a cap of 1,000 total agents per workflow run. So when Anthropic says "tens to hundreds," they mean across the full run rather than all at once. (That's still a meaningful amount of parallel work by any measure.)
Here's the part that's easy to miss in the marketing copy: some of those subagents are specifically tasked with adversarial verification. Their job isn't to produce output. Their job is to try to find problems with what the other agents produced. The architecture doesn't assume the building agents will get everything right. It builds in agents whose entire purpose is to catch mistakes, and those verification agents run in parallel with the building agents rather than waiting until everything is done.
(This is roughly how you'd run a large engineering project with a team of junior developers. You don't trust any single output unchecked. You build in review steps. You have some people writing and some people auditing what the writers produced. The difference with Dynamic Workflows is that the auditing agents can run at the same time as the building agents, in parallel across different sections of the codebase, in a way humans physically can't do simultaneously. That's the architectural leap.)
Progress saves continuously within your Claude Code session. If something goes wrong mid-workflow, a dropped connection or an unexpected interruption, the job picks up from the last checkpoint rather than starting over. One important caveat: if you close Claude Code entirely and relaunch it, the workflow starts fresh. The checkpoint system is session-scoped, not persistent across sessions. On a job that might run for hours within a session, that continuity still matters enormously. Just don't plan to close Claude Code halfway through and come back tomorrow.
Now, about the cost. Jarred was honest about this:
Real numbers have started coming in from developers who've actually run workflows. A developer building a UI component library spent $22.75 across 1 hour 25 minutes. Another spent $110 just testing the model's agentic coding claims on a microservices project. One developer watched 79 of 104 agents run in the first 6 minutes, burning 1.2 million tokens. Another reported 3 million tokens across 60 agents on a single task. One developer on the $200 Max plan hit the API's usage limits twice in a row and concluded it "wouldn't last me a week." These aren't edge cases. They're representative of what developers are reporting after trying it.
The most useful data point I've seen came from @CommerceJohn, who noted something that changes how you should approach the feature entirely:
ADRs are Architecture Decision Records, documents that describe your system's structure, patterns, and constraints. The insight is that without them, Claude explores your codebase to understand it before doing anything useful. With them, it plans. The difference isn't marginal. If this holds for your codebase, it's the most practical cost-reduction advice in this piece.
You can give Claude a ballpark token budget before the workflow starts. Anthropic describes Dynamic Workflows as involving "substantially higher token consumption than typical sessions" but hasn't published a specific multiplier, and I'm not going to make up a number. What I will say is that the cost fires before the real work starts. Before a single line of your codebase gets analysed, Dynamic Workflows has already generated an orchestration script, spun up the initial agent batch, and fired the first round of adversarial reviewers. That overhead exists regardless of how large your actual task is. One developer noted that a workflow burned through tokens to produce a 58-line plan.md file. The output was tiny, but the orchestration machinery ran at full cost to get there.
The cost is proportional to complexity, not task size. For a 750,000-line migration, the overhead is a rounding error. For a small exploratory task without ADRs, it can be most of the bill. There's a confirmation step before your first workflow runs for this reason. Don't skip reading it.
For reference, what Dynamic Workflows is actually suited to:
- Codebase-wide security audits or bug hunts where you need parallel analysis
- Large dependency upgrades or framework migrations spanning hundreds or thousands of files
- Dead code identification across a large service (the Klarna use case)
- Any task where independent parallel analysis is more reliable than a single sequential pass
What it's not suited to: small, well-scoped coding tasks. If you need to write a function or debug a specific error, standard Claude Code is faster and cheaper. Dynamic Workflows is for the projects that have been sitting on the backlog because the scope made them feel undoable in a normal sprint.
(If you've read about the Ralph Wiggum technique for getting Claude into single-agent autonomous loops, [article:ralph-wiggum-technique-claude-code-autonomous-loops-2026] gives the background on that. Dynamic Workflows is essentially that concept built into the product properly, with orchestration and adversarial checking that you'd have to construct manually otherwise.)
---
Who Can Use It Right Now
I want to be straightforward about this rather than vague.
Dynamic Workflows is in research preview. Research preview means Anthropic considers it functional enough to release, but not stable enough to call production-ready. That distinction matters when you're deciding whether to run it on something critical.
Access at the time of writing (31 May 2026):
- Max plan: yes, on by default
- Team plan: yes, on by default
- Enterprise plan: yes, but your admin needs to enable it (off by default)
- Pro plan: yes, but you need to manually activate it. Go to
/configin Claude Code and enable it from there.
Available surfaces: Claude Code CLI, VS Code extension, Desktop app, Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
The "admin must enable it for enterprise" thing is deliberate. Anthropic knows that "substantially higher token consumption" multiplied across an enterprise team without anyone realising it could create a surprise bill. The default-off stance is them being thoughtful about that, not them burying the feature.
If you're on Max and want to try it: you'll get a confirmation step before your first workflow runs. I'd suggest starting with something bounded, not your most critical codebase. A legacy dead-code audit, an old repository you want to understand before you touch it, something where the cost of a mistake is low and the value of a good output would be clear. That's the sensible first attempt.
If you're on Team and considering a pilot: same advice. Pick a project where the scope has been daunting but the stakes aren't existential. Running an analysis of a service you've been maintaining for years is a better first test than migrating your production database layer.
For Australian businesses accessing Claude through integrations and third-party tools rather than the raw API: it depends on whether your vendor has enabled Dynamic Workflows in their surface. Worth asking directly. "Do you support Claude Code Dynamic Workflows, and is it enabled for our account?" is the specific question.
---
What Changes
I want to be careful here, because this section is where AI coverage usually turns into hype.
The old model for AI coding assistance is: Claude helps a developer write faster. The developer still runs the project. They write a function, Claude suggests improvements, the developer reviews and pastes it in. The developer's judgement is doing most of the work. Claude is a very fast autocomplete.
Dynamic Workflows shifts that model toward: Claude runs a project, developer reviews the output. And I want to be precise about what "review" means in that context, because it's not passive. Someone still has to decide what the project should accomplish. Someone still has to check whether the output actually does what you asked. Someone still has to be accountable if it doesn't. That someone is still you.
What changes is the nature of the work. Instead of writing code and using Claude to help, you're specifying what the code should do and using Claude to produce it. That's closer to senior engineering work than junior engineering work, which is a strange thing to say about a tool that's automating a lot of what used to be technically demanding.
The Klarna dead-code example is useful to think about here. Identifying dead code across a large codebase isn't a hard problem intellectually. It's tedious. You read a lot of code. You trace a lot of call paths. You write a lot of notes. Most teams have it on the backlog precisely because it's daunting in terms of time, not because anyone is confused about how to do it. Dynamic Workflows makes it tractable because you can run the analysis across the whole codebase in parallel, with agents checking each other's findings. The bottleneck was always hours, not intellect.
My honest position: I'm not moving Webcoda's client projects onto Dynamic Workflows today. Research preview status, unquantified token costs, and the enterprise admin requirement all tell me it needs more time to stabilise before I'd be comfortable putting client work through it. But Jarred's project changed how seriously I'm watching it. The thing that shifted for me wasn't the feature announcement. It was his tweet. Because he has 750,000 lines of code that people depend on, and he committed the result.
(One side note worth mentioning: the KAIROS background daemon that appeared in the Claude source code leak back in April pointed at exactly this kind of architecture: an always-on agent running background tasks autonomously. [article:claude-code-source-leak-kairos-hidden-features-2026] covered it in detail. Dynamic Workflows is starting to look like KAIROS turned into an actual product.)
If you want the broader context of how Opus 4.8's launch landed with developers (the three-camp reaction, the version-churn debate, and where Mythos actually stands), that's in our Opus 4.8 piece:

Anthropic shipped Opus 4.8 in 41 days. The internet can't decide if it's a big deal.
Anthropic's newest model dropped 41 days after the last one. Developers are split three ways: some say it cured Claude's laziness, some say the...
Read full article---
Closing
Let me bring it back to the specific.
Jarred Sumner. 750,000 lines. Zig to Rust. 6 days as he described it, 11 days as the full project from first commit to merge according to Anthropic's own blog. 99.8% of the test suite passing.
I've spent most of the last few years trying to maintain an accurate sense of what AI can actually do versus what gets announced with a lot of noise and then turns out to be considerably more limited in practice. I think that's the right instinct. Most of what gets called a category shift in AI turns out to be a genuinely useful tool that needs careful management, not transformation.
But I can't apply that instinct to a developer porting 750,000 lines with a 99.8% test pass rate. That's a different kind of evidence. Not a benchmark from a controlled environment. A project. A production project. With a test suite. And it passed.
Most of us don't have 750,000-line codebases or the budget for Max plans. I'm not suggesting everyone needs to rush to try Dynamic Workflows this week. But the question it's answering, whether AI can reliably run a large engineering project rather than just assist with one, is the question that matters most for how software development changes over the next few years. Jarred's project is the first time I've seen an answer I'd actually trust. I don't have all the answers about what that means for the rest of us yet. Nobody does. But I'm watching.
---
Key Takeaways
- Jarred Sumner ported 750,000 lines of Bun from Zig to Rust using Dynamic Workflows. He described it as taking 6 days; Anthropic's blog puts the full project at 11 days from first commit to merge. Both figures are significant.
- 99.8% of the test suite passed. That's the number that matters, not the timeline.
- Dynamic Workflows runs tens to hundreds of parallel subagents on complex tasks, with adversarial verification built in (some agents specifically find problems with what other agents produced).
- Progress saves continuously, so interrupted jobs resume from the last checkpoint rather than starting over.
- Token costs are described as "substantially higher than typical sessions." A confirmation step runs before your first workflow to make sure you're aware.
- Research preview, available now on all paid plans (Max, Team, Enterprise, and Pro). Enterprise requires admin to enable; Pro requires manual activation via
/config. - Available across Claude Code CLI, VS Code, Desktop, API, Bedrock, Vertex AI, and Microsoft Foundry.
- It's suited to codebase-wide audits, large migrations, dead code identification, and tasks where parallel independent analysis beats sequential work. It's not suited to everyday coding assistance.
- Research preview status means treat it as powerful but not production-ready for critical work.
---
Sources
- Anthropic, "Introducing Dynamic Workflows in Claude Code": https://claude.com/blog/introducing-dynamic-wor...
- Anthropic, "Introducing Claude Opus 4.8": https://www.anthropic.com/news/claude-opus-4-8
- Claude API Docs, What's New in Claude 4.8: https://platform.claude.com/docs/en/about-claud...
- Jarred Sumner (@jarredsumner), 750K-line Bun port tweet (333K views): https://x.com/jarredsumner/status/2060050578026...
- Jarred Sumner (@jarredsumner), "state of the art" endorsement: https://x.com/jarredsumner/status/2060050583017...
- Jarred Sumner (@jarredsumner), token cost explanation: https://x.com/jarredsumner/status/2060050597621...
- John Kennedy (@CommerceJohn), ADR cost reduction insight (47.4K views): https://x.com/CommerceJohn/status/2060129955493...
