What is loop-driven development in Claude Code?

Loop-driven development is the practice of giving an AI agent a verifiable goal and letting it iterate against that goal autonomously, rather than crafting a single perfect prompt and reviewing one response. Claude Code now supports the building blocks natively: /loop re-runs a prompt on a schedule (a fixed interval, an interval Claude picks, or a maintenance prompt), and /goal keeps the agent iterating until a defined success condition like passing tests is met, then stops. The /goal feature is the closest thing to the hand-rolled bash Ralph loop from mid-2025, productised with a proper stopping condition.

How is the Claude Code /goal feature different from the Ralph loop?

The Ralph loop is a bash one-liner that feeds the same prompt to Claude over and over with no built-in stopping condition. The /goal feature builds the loop into Claude Code itself: you define a success condition (tests, lint, end-to-end checks), and after each turn a separate fast evaluator model checks that condition against the work. Not met, it keeps going; met, it records that the goal's been achieved and stops. It self-verifies instead of relying on you to notice when the job is done.

What's the biggest risk with autonomous loops?

Verification. A loop can produce plausible-looking code that's wrong, and at scale it can produce it faster than a human can review it. A self-verifying loop can also satisfy a weak success condition without actually solving the problem (it 'tricks' the test). The current answer is adversarial verification: a separate reviewer agent that actively tries to break each change, plus strong real tests the agent can't easily game.

Do other AI coding tools have a loop or goal feature?

As of mid-2026, OpenAI Codex is reported to have a similar goal feature alongside Claude Code's, though Claude Code's /loop and /goal are the ones documented in detail. The pattern of native goal-oriented persistence (set a success condition, let the agent iterate until it's met) looks to be heading toward becoming a standard feature across the major coding tools rather than a hand-rolled hack.

The Loop People Were Right: Anthropic Shipped Their Argument as a Feature

I was in the loop camp. I should say that up front, because it changes how you should read the rest of this.

When the Ralph Wiggum loop surfaced in mid-2025, I wrote about it here, then twice more. I thought the idea was right even when the demos were rough, and I said so. So when a prominent, skeptic-leaning developer publicly conceded the point last week, I'd love to tell you I read it with calm professional detachment. I didn't. I read it the way you read a review that agrees with you. Which is exactly why I want to be careful in this piece, because the easiest argument to get wrong is the one you already believe.

Here's the concession that kicked it off:

Theo - t3.gg

@theo

I hate to admit it but the loop people were right

4.4K

17 June 2026

"I hate to admit it but the loop people were right." A hundred and fifty thousand views and well over a thousand quote-tweets, most of them piling on with their own version of the same admission. Theo isn't a hype merchant. He's the kind of developer who pokes holes in things for a living, which is what made the line land. Six months ago this stuff was a fringe technique with a cartoon name. Now a respected critic is conceding, and the bigger tell sits underneath the meme: Anthropic shipped the argument as a feature.

Usual disclosure: we use Claude every day at Webcoda, and this site's tooling is built on it. Factor that in, especially here, because this article argues Anthropic got something right. I'd rather you read it with that scepticism than without it.

If you haven't read the original piece on the Ralph technique, start there. This one assumes you know roughly what a loop is. It's the deep dive on what happened next.

Ralph Wiggum from The Simpsons sleeping at a keyboard while code runs successfully on monitors behind him, illustrating the autonomous AI coding technique.

The Ralph Wiggum Technique: Ship Code While You Sleep

A developer left Claude Code running for three months. It built a working compiler. Here's the absurdly simple technique that's changing how...

Read full article

The split that actually happened in 2025

To understand why this matters, you have to remember how divided the room was.

By late 2025, the developer community had quietly sorted itself into two camps over how you actually get good work out of a coding agent. Nobody called them camps. People rarely admit they're in a camp. But the split was real.

The first camp believed in the prompt. Get the context right, scope the task carefully, write the instruction precisely, keep a human reviewing each step, and the model produces good code. The craft was in the asking. This is a sensible position held by serious people, and for most of 2024 and early 2025 it was simply correct. The models weren't reliable enough to trust without a tight rein, and a sloppy prompt got you confidently wrong output fast.

The second camp believed in the loop. The insight, mostly traced to Geoffrey Huntley (an Australian, for what it's worth, which I enjoy), was that you stop trying to nail one perfect prompt. Instead you define a goal the work can be checked against, then let the agent iterate at it. Fail, look at what broke, fix, repeat. The craft moves from the instruction to the conditions: what does "done" mean, and how does the machine know it's done?

The loop camp got mocked, and not unfairly. The flagship technique was literally a bash while loop named after the Simpsons kid who eats glue. Here's a clean summary of the actual mechanism, plus Huntley's rule that became a slogan:

Sam Cui

@samcmkt

Ralph loop — a dead-simple way to let an AI coding agent run itself until the whole spec is done. It's the technique behind all the "loop engineering" noise right now.

Ralph was coined by Geoffrey Huntley. Google's Addy Osmani gave the wave its name — loop engineering. Even

13 June 2026

while :; do cat PROMPT.md | claude-code ; done. That's the whole thing. Same prompt, fed in over and over, no memory that the task was finished, brute force by repetition. Huntley's rule was "sit on the loop, not in it," meaning your job is to set it running and supervise from above, not to babystep each iteration.

You can see why smart people doubted this. It looks like throwing compute at a wall. And plenty of early attempts produced exactly the mess you'd expect: loops that span forever, burned tokens, and committed garbage with great confidence. The clearest before-shot I can give you is this, from January:

Shaw (spirit/acc)

@shawmakesmagic

For the record we did Ralph almost immediately after coding agents starting working

It doesn’t work well

It was the first thing I tried obviously

It is a hype larp, not an ideal strategy for Claude code or anything

I hope you learned your lesson

- someone who has been coding

320

22 Jan 2026

"It doesn't work well. It was the first thing I tried obviously. It is a hype larp." A prominent developer, in plain language, calling the whole thing a costume. In January, that was a defensible read. The technique was fragile, the tooling was nonexistent, and the wins were anecdotal. If you'd told me then that the bash loop would become a first-class product feature inside six months, I'd have said you were overexcited.

I'd have been wrong. Here's what changed.

Anthropic shipped the argument

The thing that settles an internet debate isn't usually a better argument. It's a product decision.

In June 2026, Claude Code's native /loop and /goal features turned the hand-rolled hack into supported features. The dumb bash loop became something you invoke with a slash command, with proper stopping conditions and tool integration behind it. (Boris Cherny calls it "/loops" in the clip below; the command in the docs is /loop. Same idea, don't let the plural throw you.) That alone would be notable. What made it impossible to ignore was who started talking about it.

CyrilXBT

@cyrilXBT

Claude Code creator Boris Cherny.

30 minutes.

Three quotes that reframe everything:"100% of our pull requests at Anthropic are run by Claude Code.""We deleted 50% of the system prompt when the new models dropped.""/loops is my favorite feature today.

I'm not prompting

100

16 June 2026

That's Boris Cherny, who created Claude Code, on record: "/loops is my favorite feature today. I'm not prompting Claude anymore. I'm building loops." Read that again, because it's the whole article in two sentences. The person who built the tool, working inside the company that makes the model, isn't writing careful prompts anymore. He's defining loops and letting them run.

You can argue about a meme tweet from a critic. It's harder to argue with the tool's own author changing how he works, and then changing the product to match. When the people closest to the model start building loops instead of prompts, the "perfect prompt" camp has a problem that no amount of debate fixes. The default moved.

I want to be precise about what "the default moved" means, because it's easy to overstate. Prompting didn't die. You still write a clear instruction to start a loop, and a vague goal still produces vague results. What changed is where the effort goes. Less of it goes into the wording of a single request. More of it goes into defining the conditions the work iterates against. That shift is the substance of the vindication, and it's why the feature releases matter more than any single quote.

The four mechanisms, from caveman to engineered

Here's where most of the confusion lives, so I want to draw the lines carefully. "Just let it loop" covers at least four distinct things now, and they sit at rising levels of sophistication. They're the same idea (iterate against a verifiable goal instead of perfecting one prompt) built four different ways.

1. The Ralph loop (mid-2025): the caveman version

while :; do cat PROMPT.md | claude-code ; done. We've covered it. It's the dumbest possible implementation and that's the point. Same prompt every time, no awareness that the task is done, no real stopping condition beyond you noticing and killing it. The intelligence isn't in the loop, it's in the fact that each iteration sees the files and git history the previous one left behind. The loop is just persistence. It works, it's brittle, and it taught everyone the lesson the next three mechanisms built on.

2. Native /loop: scheduled iteration, managed by the tool

Here's a distinction worth getting right, because the two native features do different jobs and people muddle them. /loop is the scheduler. It took the "keep running" half of what people were hand-rolling and made it native: instead of a shell loop spinning forever, Claude Code re-runs a prompt on a cadence you set. That can be a fixed interval (every fifteen minutes, every morning), an interval Claude picks itself based on the work, or a built-in maintenance prompt. The state that matters lives in files, git, and tests rather than crammed into one session's context window, so each run picks up where the last left off.

It's persistence with a clock attached, and far less brittle than a bash loop with no off-switch. Defined cadence, proper tool integration, no runaway while spinning at 3am because you forgot to kill it. What /loop deliberately doesn't do is decide when the job is finished. That's the other feature's job.

3. /goal: iterate until a success condition is met

This is the one that makes the vindication concrete, and the cleanest description I've seen came from the community before I could write a better one:

ClaudeDevs

@ClaudeDevs

It's the Ralph loop, built into Claude Code. Every time Claude tries to stop, it checks your condition against the transcript.

Not done? It keeps going. Done? You get a "Goal achieved" summary.

312

13 May 2026

That community summary captures the spirit, though the real mechanism is a touch more deliberate than "checks the transcript." You define a success condition, real tests passing, lint clean, an end-to-end check going green, and after each turn a separate fast evaluator model judges whether that condition holds and feeds the answer back. Not satisfied, the agent keeps working. Satisfied, the run records the goal as achieved and stops. The detail that matters there is the separate evaluator: the model doing the work isn't the model grading it. Anthropic clearly knew that letting an agent mark its own homework was the obvious failure mode, and split the judge out from the worker.

The leap from /loop to /goal is the leap from "keep going on a schedule" to "keep going until this specific, checkable thing is true." That's the loop camp's entire thesis, packaged as a feature. The craft was never the prompt. It was the stopping condition. /goal is Anthropic agreeing in code. OpenAI's Codex is reported to have shipped something similar, which, if it holds up, tells you this is an industry direction and not a one-vendor bet.

4. Dynamic Workflows: orchestrated parallel agents with adversarial verification

The most engineered version, and the newest. Dynamic Workflows landed as a native Claude Code feature only weeks ago, and there's now an even more aggressive orchestration setting layered on top of it called ultracode mode. Both are recent arrivals, so be a bit careful about the order of events. People were rigging the equivalent by hand long before either shipped: spinning up multiple agents, scripting the coordination between them, bolting on their own review passes. We did rough versions of it ourselves. The native features didn't invent the pattern, they productised what the loop crowd had already been cobbling together manually. I covered the Dynamic Workflows version in full at the end of May, so I won't relitigate the whole thing here.

Developer at keyboard with code running autonomously on monitors behind them, illustrating parallel agent orchestration in Claude Code

Claude Code Dynamic Workflows: 750,000 lines in 6 days

The creator of Bun ported 750,000 lines from Zig to Rust using Claude Code's new Dynamic Workflows feature. Six days. 99.8% tests passing. Here's...

Read full article

The short version: instead of one loop, you orchestrate many agents in parallel, and crucially, some of them are tasked with finding faults in what the others produce. That adversarial verification piece is the important bit for this article, and I'll come back to it, because it's a quiet admission that "just loop it" was never enough on its own.

The spine across all four: Ralph was the caveman with a rock, one process doing persistence and stopping by hand. Anthropic split those two jobs out. /loop took the persistence and gave it a clock. /goal took the stopping and gave it a checkable condition. Dynamic Workflows runs a whole crew and makes some of them sceptics. Rising sophistication, one underlying bet. The bet paid off.

The honest counterargument: was it the loop, or just the models?

Now the part where I try to argue myself out of my own position, because if I don't, this piece reads like a press release and you should stop trusting it.

The strongest case against me is this: maybe the loop people weren't right about anything except timing. Maybe the models just got good enough that brute force works now. When Ralph appeared in mid-2025, models needed many iterations to stumble toward a working answer, so the loop looked clever because it papered over unreliability. By mid-2026, with Opus 4.8 (and Fable 5, before the export-control mess pulled it offline), a single well-scoped pass often gets most of the way on its own. Under that reading, the "vindication" belongs to raw capability. The loop didn't win. It just stopped being needed as often, and we're mistaking a model improvement for a philosophy being correct.

I take that seriously, because there's truth in it. The models did get better. Some tasks that needed a 20-iteration loop in 2025 now land in two or three passes. If you'd run today's models in last year's loops, you'd have spent a fortune iterating on problems that no longer need iterating.

But here's the nuance that complicates the "it was just the models" line, and it comes from a sceptic, not a believer:

@MoneyPrinter0x

@ouro_bouros12 no it just means that my harnesses were extremely well engineered. i built the latest claude code's "workflows" feature before it was a thing, for my own personal use.

nowadays i run closed-loop compressed sprint cycles in 12 hours with repeatable composable subagents, with

13 June 2026

The argument there is that people dismissing the harness are "running the models naked." Strip away the loop, the tooling, the verification scaffolding, and you leave capability on the table even with a strong model. Which cuts against the pure-capability story. If the harness and the loop didn't matter, running the model bare would produce the same results. It doesn't. The people getting the most out of these models are the ones who built the iteration machinery around them, not the ones typing better sentences.

So where do I actually land? Credit is shared, and I'll say that plainly rather than pretend the loop camp gets all of it. Model progress did real work here. But the loop camp's core claim, that iterating against a verifiable goal beats optimising a single prompt, held up regardless of how strong the model got. Stronger models made loops cheaper and faster. They didn't make the prompt-perfecters right. If anything, better models raised the ceiling on what a good harness could do, which rewards the loop camp's instinct more, not less.

That's the vindication, narrowed to what it can actually carry. Not "loops solve everything." Just "iterate-against-verification was the right thing to optimise, and the perfect prompt was the wrong thing." On that specific claim, the loop people were right.

Two more concessions, because they're the ones that could sink the whole argument and I'd rather raise them than have you raise them for me. First, look hard at my evidence and most of it comes from Anthropic: the company shipped the feature, the company's own creator endorsed it, and I'm a bloke who uses Claude every day and runs a business on it. A vendor shipping something proves the vendor placed a bet, not that the bet was correct. Theo's concession is the only load-bearing data point that isn't Anthropic or me, and it's one person changing his mind, not a camp surrendering. So treat the confidence in this piece as a strong argument, not a closed case. I think the feature releases are real signal. I also know I'm the last person who'd notice if they weren't.

Second, and more important: this only works where "done" is mechanically checkable. A loop is exactly as good as the success condition you can write, which means loop-driven development wins cleanest on the slice of work that has hard tests, clear acceptance criteria, a green-or-red answer. A schema migration. A framework upgrade. A test suite to satisfy. It wins far less, maybe not at all, on the underspecified, taste-driven, "I'll know it when I see it" work that fills most actual workdays. Nobody's looping their way to a good product decision. So when I say the loop people were right, read it bounded: they were right about the verifiable slice, and that slice is bigger than the prompt-perfecters thought, but it isn't all of software, and pretending otherwise is how you end up disappointed at 2am watching a loop confidently solve the wrong problem.

The part nobody's solved: how do you know it worked?

Which brings me to the bit that keeps the smug grin off my face.

A loop that runs until tests pass is only as trustworthy as the tests. And there's a nastier failure than a loop that breaks: a loop that succeeds, declares victory, and is wrong. It produced plausible code that passes a weak check, and at the speed these things move, it produced it faster than any human could read it. This is the verification problem, and it's the genuinely unsolved part of loop-driven development.

The sharpest framing of the risk came from a developer looking straight at /goal:

Alex

@alexanderOpalic

The new /goal feature that both Codex and Claude Code have is really a tool to be aware of. One good example is if your page loads images slowly, you can use /goal; of course, you also need to define how a good goal should look. The idea is to give the agent the ability to verify

16 June 2026

The point: when you give an agent the ability to verify its own changes end to end, the danger is that it games the verification. It satisfies the letter of your success condition without meeting the spirit of it. His caveat is the right one, though, and worth holding onto: if your codebase has genuinely good tests, this is hard to pull off. The agent can trick a flimsy condition. It struggles to trick a real one.

So what does a real one look like in practice? The most convincing answer I've seen isn't a better test suite. It's a sceptic.

A developer named Ethereal built an adversarial reviewer for loop output, an agent whose entire job is to try to break every diff the loop produces. Not to confirm it works. To find the case where it doesn't. That's the same instinct behind Dynamic Workflows' adversarial verification, where some agents in the crew exist specifically to attack the others' output. And it's an important admission hiding in plain sight: the fact that the most advanced loop systems build in a dedicated adversary tells you the loop alone was never sufficient. You don't need a sceptic agent if iteration reliably produces correct code. You need one because it doesn't.

This is where I'd push back on anyone reading the vindication as "set it and forget it." The loop won the argument about how to structure the work. It did not win the argument about trust. Trust still gets built the way it always was in engineering: by something trying hard to prove the work wrong before you ship it.

How the people who built it actually work

If you want to know how far this goes, look at the team behind Claude Code, because they've taken it further than almost anyone.

Boris Cherny, who created Claude Code, put it plainly on Lenny's Podcast in February: "I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do. My job is to write loops." That isn't a tip. It's his actual job changing shape.

And he's got the receipts. In December he posted that over the previous thirty days, 100% of his contributions to Claude Code were written by Claude Code: 259 pull requests, and by his own account he hadn't opened an IDE all month. On the podcast he said he hasn't hand-edited a line of code since November, and that he ships somewhere between ten and thirty pull requests a day. That's the loop philosophy stress-tested by the person with the most to lose if it breaks, running on the tool's own codebase.

The practices underneath it are unglamorous and copyable. A CLAUDE.md file checked into the repo as persistent memory, where the team writes mistakes back in so the agent stops repeating them. Three to five git worktrees running at once, each with its own Claude session, which Cherny has called the single biggest productivity unlock. Sid Bidasaria, a founding engineer on the team, built the subagents feature (reportedly in about three days) so a single run could spin up specialised helpers instead of one agent grinding through everything alone.

None of that needs Anthropic-scale infrastructure. A CLAUDE.md file, a few parallel worktrees, and the discipline to write loops instead of prompts is available to any small team that decides to work this way. The gap between the people building these tools and the people using them isn't budget. It's whether you've reorganised how you work around the loop, or you're still treating the agent like a clever autocomplete you have to supervise.

What this means if you write code or run a team

Strip away the meme war and there's a practical shift that changes how you hire, train, and spend your time.

For years the valuable skill was prompt craft: scoping a request, supplying the right context, talking to the model well. It's commoditising fast, partly because the models forgive sloppier prompts now and partly because the loop absorbs what careful prompting used to do for you. The skill that's appreciating is building the verification harness: defining what "done" means in checkable terms, writing tests an agent can't satisfy without solving the real problem, designing the adversarial review that catches plausible-but-wrong output before it lands in main.

So if you lead a team, change what you assess for. Stop grading people purely on how cleverly they prompt an agent. Start grading whether they can define a goal an agent can be measured against, and build the checks that make autonomous iteration safe to trust. At Webcoda the shift has been quieter than the internet suggests (we're not running hundred-agent swarms on government content), but the pattern's identical: more time on test coverage and success conditions before a longer run, less time wordsmithing the opening instruction. The investment moved from the front of the task to the scaffolding around it.

Where this leaves us

I've argued the loop people were right, narrowed it to what that actually means, conceded the model-progress credit, and flagged verification as the open wound. One last bit of honesty about my own track record: I thought I was right about the loop in 2025 too, so the difference now is that I'm watching for the ways I could be wrong, not just the ways I could be proven right. The models will keep improving and muddying how much credit the loop philosophy really deserves, and some of this will read differently in a year.

But the core of it I'll stand on. The room split in 2025 over whether to perfect the prompt or trust the loop, and Anthropic settled it by building the loop into the product and watching its own creator stop prompting. The hack got a handle, then a stopping condition, then a crew. The caveman with a rock turned out to be onto something. The work now isn't proving that. It's making sure the loop didn't just succeed, but actually got it right.

Key Takeaways

The vindication, precisely:

The 2025 developer split was prompt-craft vs loop iteration. The loop camp's core claim, that iterating against a verifiable goal beats optimising one prompt, held up.
The proof isn't a tweet. It's Anthropic shipping native /loop and /goal, and Claude Code's creator Boris Cherny saying he builds loops instead of prompting.

The four mechanisms (same idea, rising sophistication):

Ralph loop: a bash while loop, same prompt, no stopping condition, brute force.
/loop: native scheduled re-running on a cadence (fixed, dynamic, or a maintenance prompt), progress in files/git/tests. Persistence with a clock.
/goal: iterate until a defined success condition is met, with a separate evaluator model judging each turn, then stop. "The Ralph loop built in."
Dynamic Workflows: orchestrated parallel agents with adversarial verification, where some agents attack the others' output.

The honest caveats:

Credit is shared with model progress. Stronger models made loops cheaper, not the prompt-perfecters right.
Verification is unsolved. A loop can produce plausible-but-wrong code, or game a weak success condition. Adversarial review and genuinely strong tests are the current answer.

The practical shift:

The appreciating skill is building the verification harness, not writing the perfect prompt.
For teams: assess whether people can define checkable goals and build the checks that make autonomous iteration trustworthy.

---

Sources

Theo (@theo). "I hate to admit it but the loop people were right." X, 17 June 2026. https://x.com/theo/status/2067115748959682743
CyrilXBT (@cyrilXBT). Boris Cherny quote: "/loops is my favorite feature today. I'm not prompting Claude anymore. I'm building loops." X, 16 June 2026. https://x.com/cyrilXBT/status/2066794406171042242
0xCodez (@0xCodez). Boris Cherny quote (podcast clip): "feature I'm using most is /loops. I'm not prompting Claude anymore - I'm building loops." X, 15 June 2026. https://x.com/0xCodez/status/2066530826121036038
ClaudeDevs (@ClaudeDevs). "It's the Ralph loop, built into Claude Code. Every time Claude tries to stop, it checks your condition against the transcript." X, May 2026. https://x.com/ClaudeDevs/status/205435103456756...
Alex (@alexanderOpalic). On /goal self-verification and the self-trick risk. X, 16 June 2026. https://x.com/alexanderOpalic/status/2066823149...
Sam Cui (@samcmkt). Ralph loop one-liner and Huntley's "sit on the loop, not in it" rule. X, 13 June 2026. https://x.com/samcmkt/status/2065851191368839668
Shaw (@shawmakesmagic). "It doesn't work well... It is a hype larp." X, January 2026. https://x.com/shawmakesmagic/status/20142743567...
MoneyPrinter0x (@MoneyPrinter0x). On harness vs model, "running the models naked." X, 13 June 2026. https://x.com/MoneyPrinter0x/status/20659090520...
Ethereal (@inferencegod). On building an adversarial reviewer that "tries to BREAK every diff." X, 16 June 2026. https://x.com/inferencegod/status/2066726919920...
Geoffrey Huntley. "Ralph Wiggum as a 'software engineer'." ghuntley.com, 2025. https://ghuntley.com/ralph/
Boris Cherny, on Lenny's Podcast. "My job is to write loops." 19 February 2026. https://www.lennysnewsletter.com/
Boris Cherny. X post on 30 days of contributions: "100% of my contributions to Claude Code were written by Claude Code" (259 PRs). December 2025. (reported via officechai.com and Hacker News)
Gergely Orosz (Pragmatic Engineer). "How Claude Code is built" (Sid Bidasaria, founding engineer, creator of subagents). 2026. https://newsletter.pragmaticengineer.com/p/how-...
The New Stack. "Loop Engineering." 9 June 2026. https://thenewstack.io/loop-engineering/

---