The Autonomous Arc

Over the past few weeks, I've kept hearing one name over and over: Ralph Wiggum.

Why?

Well, if you're keeping up with the rapid pace of AI development these days then you've undoubtedly stumbled across this chatter at least once.

But what is it? That's what I wanted to figure out this past weekend. So I got to building something fun for my gf and I.

Ralph Loops

The TLDR is that the Ralph loop is a workflow for AI agents (like Claude Code) that enables them to build software autonomously.

How does it do this?

The core idea is pretty simple. You need a markdown file to declare your specifications for an app or feature, another file to keep track of important information about your project (things like progress logs, key architecture notes, where to find features, etc.), and finally the workhorse: a bash loop that repeatedly runs an AI agent against a list of atomic tasks provided by your specs file. The magic is in the fact that the loop itself ensures the AI stays "smart" (prevents context rot) by working on each new task in a dedicated fresh context window. Pretty neat.

At the start of the run, your prompt.md instructs the AI to do the following:

Pick up some priority task from your specs.
When done with work, update the progress files, run tests, document learnings, and commit the changes.
This then repeats until a signal is given, usually mentioned as the stop condition in the prompt.md. This is some text snippet output by the model (e.g. <PROMISE>Done</PROMISE>) that indicates all of the tasks are done, and then the loop exits.

# Example Ralph loop
while :; do
  output=$(cat prompt.md | claude --dangerously-skip-permissions)
  echo "$output"
  if echo "$output" | grep -q "<PROMISE>Done</PROMISE>"; then
    echo "Tasks complete, exiting."
    break
  fi
done

I went through all of this pretty quickly, but that's the rough idea. There are already a dozen different flavors of this concept out there to try out. Anthropic has their own Ralph plugin, but implemented a bit differently. They explicitly mention they don't use external bash loops, and the plugin instead runs everything in the current session. I haven't given it a try, but just keep in mind that this might run into the context issues I mentioned earlier.

If you're curious to learn more, I think the simplest start point is the Ralph repo by Ryan Carson.

An Idea is Born

So after learning a bit about the core mechanism of Ralph, it was time to move from theory to building.

My girlfriend and I have been going through and binging Top Chef over the past few months. Top Chef is a cooking competition show with a pretty stable format. Each season starts with ~15 chefs as contestants, and every episode consists of two rounds of challenges. At the end of each episode a chef, or sometimes multiple chefs, get sent home.

I thought it'd be fun to gamify this experience a bit with an app, and so the Top Chef Voting app idea was born. The basic idea of this app is that my gf and I can each individually vote on which chef is winning a challenge or will be eliminated each episode, and the app keeps track of our predictions throughout the season. Who ever has the most points at the end of the season can cash those points in for a prize.

This felt low stakes enough to actually try the Ralph loop.

Did It Work?

I mean, yeah. Kind of.

I ended up pointing Claude Code to Ryan Carson's Ralph repo and having it set up everything for me. I took a few minutes to read the instructions in the README.md and I was ready to roll.

I created my PRD, converted that PRD to the Ralph format (just a .json file with tasks) using the recommended skill from the repo, and let it rip. It was beautiful.

..and then 10 minutes later I realized something crucial. I couldn't really see what Claude was actually doing. I could see the output of its work, but while the loop was running, I could only see the output from the bash script itself. So I could tell when an iteration kicked off, or when it finished, but everything that Claude was thinking or doing in-between was invisible. I was flying blind.

This ended up being a pretty big problem for me. There would be times when an iteration would get stuck for a long time and I wouldn't really have any insight as to what exactly was going on. So do I kill the script and restart? or is Claude just taking really long to think through something? This happened pretty frequently.

I got around this a bit by tailing the CC logs (using something like tail -f ~/.claude/debug/$(ls -t ~/.claude/debug/ | head -1). With this running, I could at least tell when the model was churning on something, but still, it wasn't a great solution for actual thinking visibility.

I stumbled like this for some portion of Saturday afternoon. I'd start a Ralph loop, leave my computer for a bit, come back in a few minutes to see if a loop was stuck, and restart the loop if needed.

By the end of the nigh, some ~50 some odd commits later, I had a pretty good working version of my app. The design wasn't great yet, in fact it was pretty terrible, but at least everything worked. The database was hooked up, PWA functionality worked, and the architecture looked sane at a quick glance. I was pretty happy. What would've normally taken many hours spread out through at least a week of my time was reduced to ~3-5 hours of work in a single Saturday afternoon, while I multi-tasked with other things.

After the initial Ralph loop finished, at that point it was just a matter of small polish. A few more prompts later and I ended up with an app that I felt comfortable deploying and getting into the hands of our users (aka my gf and I).

So What?

I guess that's the big question. So what? What does this all mean? Honestly, idk. But I have some thoughts:

At the very least, if you're a software engineer and you're not paying attention to this yet and playing with the tools, you probably should. Things are accelerating quickly.
As I was working on this, it struck me how much of the friction has suddenly been removed from development now and how fast the feedback loops are becoming. For example, I must've had Claude Code sketch out at least 4 or 5 different designs while thinking through the apps design before deciding on one direction that I sort of liked. But that rough sketch idea became the base from which we would continue to iterate on to get closer to what I felt I wanted. This is a completely new way of working that I think will have some pretty insane long-term consequences for how software gets built. It actually reminds me a lot of music production, but maybe that's a post for another time.
Logan Kilpatrick recently tweeted that engineers are artists now thanks to AI, and artists are increasingly becoming engineers. I think this is broadly true. Your imagination will be the only limit with these new technologies. Software creation will increasingly be seen as a normal thing that people do on the internet to express themselves, regardless if you studied CS or not.
There's a weird feeling that seems to tag along after playing with this stuff and realizing just how good it's getting. Because it's so easy to spin up software now, the opportunity cost of not having an agent working in the background while you go on about your life feels heavy. The bottleneck moved from knowing how to code to knowing what you want.

Anyway, here are some screens from the Top Chef voting app:

Top Chef app - Episode voting

Top Chef app - Leaderboard

Top Chef app - Chef list

Vinny

The Autonomous Arc

Ralph Loops

An Idea is Born

Did It Work?

So What?

Resources