Eight years of wanting, three months of building with AI

436 points - today at 12:43 PM

Comments

Aurornis today at 2:55 PM

Refreshing to see an honest and balanced take on AI coding. This is what real AI-assisted coding looks like once you get past the initial wow factor of having the AI write code that executes and does what you asked.

This experience is familiar to every serious software engineer who has used AI code gen and then reviewed the output:

> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti14. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision,

Some people never get to the part where they review the code. They go straight to their LinkedIn or blog and start writing (or having ChatGPT write) posts about how manual coding is dead and they’re done writing code by hand forever.

Some people review the code and declare it unusable garbage, then also go to their social media and post how AI coding is completely useless and they’re not going to use it for anything.

This blog post shows the journey that anyone not in one of those two vocal minorities is going through right now: A realization that AI coding tools can be a large accelerator but you need to learn how to use them correctly in your workflow and you need to remain involved in the code. It’s not as clickbaity as the extreme takes that get posted all the time. It’s a little disappointing to read the part where they said hard work was still required. It is a realistic and balanced take on the state of AI coding, though.

dirtbag__dad today at 4:54 PM

> Tests created a similar false comfort. Having 500+ tests felt reassuring, and AI made it easy to generate more. But neither humans nor AI are creative enough to foresee every edge case you’ll hit in the future; there are several times in the vibe-coding phase where I’d come up with a test case and realise the design of some component was completely wrong and needed to be totally reworked. This was a significant contributor to my lack of trust and the decision to scrap everything and start from scratch.

This is my experience. Tests are perhaps the most challenging part of working with AI.

What’s especially awful is any refactor of existing shit code that does not have tests to begin with, and the feature is confusing or inappropriately and unknowingly used multiple places elsewhere.

AI will write test cases that the logic works at all (fine), but the behavior esp what’s covered in an integration test is just not covered at all.

I don’t have a great answer to this yet, especially because this has been most painful to me in a React app, where I don’t know testing best practices. But I’ve been eyeing up behavior driven development paired with spec driven development (AI) as a potential answer here.

Curious if anyone has an approach or framework for generating good tests

lubujackson today at 3:48 PM

Long term, I think the best value AI gives us is a poweful tool to gain understanding. I think we are going to see deep understanding turn into the output goal of LLMs soon. For example, the blocker on this project was the dense C code with 400 rules. Work with LLMs allowed the structure and understanding to be parsed and used to create the tool, but maybe an even more useful output would be full documentation of the rules and their interactions.

This could likely be extracted much easier now from the new code, but imagine API docs or a mapping of the logical ruleset with interwoven commentary - other devtools could be built easily, bug analysis could be done on the structure of rules independent of code, optimizations could be determined on an architectural level, etc.

LLMs need humans to know what to build. If generating code becomes easy, codifying a flexible context or understanding becomes the goal that amplifies what can be generated without effort.

rokob today at 3:05 PM

> architecture is what happens when all those local pieces interact, and you can’t get good global behaviour by stitching together locally correct components

This is a great article. I’ve been trying to see how layered AI use can bridge this gap but the current models do seem to be lacking in the ambiguous design phase. They are amazing at the local execution phase.

Part of me thinks this is a reflection of software engineering as a whole. Most people are bad at design. Everyone usually gets better with repetition and experience. However, as there is never a right answer just a spectrum of tradeoffs, it seems difficult for the current models to replicate that part of the human process.

PaulHoule today at 12:48 PM

Note I believe this one because of the amount of elbow grease that went into it: 250 hours! Based on smaller projects I’ve done I’d say this post is a good model for what a significant AI-assisted systems programming project looks like.

stepan_l today at 7:32 PM

I had the same experience, been working on my project for a few months and it started very easy and then I lost control of the code base. Had to rewrite a lot of things. The code AI writes does not look bad, but there is something wrong about it. It just does not feel right. You still need to steer it a lot. But I am very happy that I could write a quite complex project with almost no dependencies at all. Only used Electron. I don't even use npm. That is very promising how far you can get without relying on any libraries/frameworks. You can check it here https://github.com/AgentWFY/AgentWFY MIT license.

simonw today at 7:34 PM

I got the Syntaqlite Rust/Python extension working in WebAssembly and Pyodide a few weeks ago: https://github.com/simonw/research/tree/main/syntaqlite-pyth...

I just extended that demo to one that runs the resulting Pyodide library in a browser with a playground interface for trying it out: https://tools.simonwillison.net/syntaqlite

cloche today at 5:16 PM

Really great to see a realistic experience sans hype about AI tools and how they can have an impact.

> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti...It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision...I decided to throw away everything and start from scratch

This part was interesting to me as it lines up with Fred Brooks "throw one away" philosophy: "In most projects, the first system built is barely usable. Hence plan to throw one away; you will, anyhow."

As indicated by the experience, AI tools provide a much faster way of getting to that initial throw-away version. That's their bread and butter for where they shine.

Expecting AI tools to go directly to production quality is a fool's errand. This is the right way to use AI - get a quick implementation, see how it works and learn from it but then refactor and be opinionated about the design. It's similar to TDD's Red, Green, Refactor: write a failing test, get the test passing ASAP without worrying about code quality, refactor to make the code better and reliable.

In time, after this hype cycle has died down, we'll come to realize that this is the best way to make use of AI tools over the long run.

> When I had energy, I could write precise, well-scoped prompts and be genuinely productive. But when I was tired, my prompts became vague, the output got worse

This part also echoes my experience - when I know well what I want, I'm able to write more specific specifications and guide along the AI output. When I'm not as clear, the output is worse and I need to spend a lot more time figuring it out or re-prompting.

throwaway47001 today at 6:32 PM

I appreciate these kind of fact-based posts. Thank you for this.

Unfortunately, AI seems to be divisive. I hope we will find our way back eventually. I believe the lessons from this era will reverberate for a long time and all sides stand to learn something.

As for me, I can’t help but notice there is a distinct group of developers that does not get it. I know because they are my colleagues. They are good people and not unintelligent, but they are set in their ways. I can imagine management forcing them to use AI, which at the moment is not the case, because they are such laggards. Even I sometimes want to “confront” them about their entire day wasted on something even the free ChatGPT would have handled adequately in a minute or two. It’s sad to see actually.

We are not doing important things and we ourselves are not geniuses. We know that or at least I know that. I worry for the “regular” developer, the one that is of average intellect like me. Lacking some kind of (social) moat I fear many of us will not be able to ride this one out into retirement.

zellyn today at 7:03 PM

Does SQLite not have a lemon parser generated for its SQL?

When I ported pikchr (also from the SQLite project) to Go, I first ported lemon, then the grammar, then supporting code.

I always meant to do the same for its SQL parser, but pikchr grammar is orders of magnitude simpler.

jillesvangurp today at 4:48 PM

This is the hardest it's ever going to be. That's been my mode for the last year. A lot of what I did in the last month was complete science fiction as little as six months ago. The scope and quality of what is possible seems to leap ahead every few weeks.

I now have several projects going in languages that I've never used. I have a side project in Rust, and two Go projects. I have a few decades experience with backend development in Java, Kotlin (last ten years) and occasionally python. And some limited experience with a few other languages. I know how to structurer backend projects, what to look for, what needs testing, etc.

A lot of people would insist you need to review everything the AI generates. And that's very sensible. Except AI now generates code faster than I can review it. Our ability to review is now the bottleneck. And when stuff kind of works (evidenced by manual and automated testing), what's the right point to just say it's good enough? There are no easy answers here. But you do need to think about what an acceptable level of due diligence is. Vibe coding is basically the equivalent of blindly throwing something at the wall and seeing what sticks. Agentic engineering is on the opposite side of the spectrum.

I actually emphasize a lot of quality attributes in my prompts. The importance of good design, high cohesiveness, low coupling, SOLID principles, etc. Just asking for potential refactoring with an eye on that usually yields a few good opportunities. And then all you need to do is say "sounds good, lets do it". I get a little kick out of doing variations on silly prompts like that. "Make it so" is my favorite. Once you have a good plan, it doesn't really matter what you type.

I also ask critical questions about edge cases, testing the non happy path, hardening, concurrency, latency, throughput, etc. If you don't, AIs kind of default to taking short cuts, only focus on the happy path, or hallucinate that it's all fine, etc. But this doesn't necessarily require detailed reviews to find out. You can make the AI review code and produce detailed lists of everything that is wrong or could be improved. If there's something to be found, it will find it if you prompt it right.

There's an art to this. But I suspect that that too is going to be less work. A lot of this stuff boils down to evolving guardrails to do things right that otherwise go wrong. What if AIs start doing these things right by default? I think this is just going to get better and better.

moshib today at 6:27 PM

> There’s an uncomfortable parallel between using AI coding tools and playing slot machines28. You send a prompt, wait, and either get something great or something useless. I found myself up late at night wanting to do “just one more prompt,” constantly trying AI just to see what would happen even when I knew it probably wouldn’t work. The sunk cost fallacy kicked in too: I’d keep at it even in tasks it was clearly ill-suited for, telling myself “maybe if I phrase it differently this time.”

Oof, this hit very close to home. My workplace recently got, as a special promotion, unlimited access to a coding agents with free access to all the frontier models, for a limited period of time. I find it extremely hard to end my workday when I get into the "one more prompt" mindset, easily clocking 12-hour workdays without noticing.

smj-edison today at 5:38 PM

The description of working with AI tools really resonates with me. It's dangerous to work on my codebase when I'm tired, since I don't feel like doing it properly, so I play slots with Claude, and stay up later than I should. I usually come back later and realize the final code that gets generated is an absolute mess.

It is really good for getting up to speed with frameworks and techniques though, like they mentioned.

DareTheDev today at 2:47 PM

This is very close to my experience. And I agree with the conclusion I would like to see more of this

billylo today at 3:02 PM

Thank you. The learning aspect of reading how AI tackles something is rewarding.

It also reduces my hesitation to get started with something I don't know the answer well enough yet. Time 'wasted' on vibe-coding felt less painful than time 'wasted' on heads-down manual coding down a rabbit hole.

ang_cire today at 5:15 PM

It's a huge mistake to start building with Claude without mapping out a project in detail first, by hand. I built a pretty complex device orchestration server + agent recently, and before I set Claude to actually coding I had ~3000 lines of detailed design specs across 7 files that laid out how and what each part of the application would do.

I didn't have to review the code for understanding what Claude did, I reviewed it for verifying that it did what it had been told.

It's also nuts to me that he had to go back in later to build in tests and validation. The second there is an input able to be processed, you bet I have tests covering it. The second a UI is being rendered, I have Playwright taking screenshots (or gtksnapshot for my linux desktop tools).

I think people who are seeing issues at the integration phase of building complex apps are having that happen because they're not keeping the limited context in mind, and preempting those issues by telling their tools exactly how to bridge those gaps themselves.

dcre today at 5:23 PM

"Knowing where you are on these axes at any given moment is, I think, the core skill of working with AI effectively."

I like this a lot. It suggests that AI use may sometimes incentivize people to get better at metacognition rather than worse. (It won't in cases where the output is good enough and you don't care.)

simondotau today at 2:45 PM

This essay perfectly encapsulates my own experience. My biggest frustration is that the AI is astonishingly good at making awful slop which somehow works. It’s got no taste, no concern for elegance, no eagerness for the satisfyingly terse. My job has shifted from code writer to quality control officer.

Nowhere is this more obvious in my current projects than with CRUD interface building. It will go nuts building these elaborate labyrinths and I’m sitting there baffled, bemused, foolishly hoping that THIS time it would recognise that a single SQL query is all that’s needed. It knows how to write complex SQL if you insist, but it never wants to.

But even with those frustrations, damn it is a lot faster than writing it all myself.

pwr1 today at 4:18 PM

This resonates. I had a project sitting in my head for years and finally built it in about 6 weeks recently. The AI part wasn't even the hard part honestly, it was finally commiting to actually shipping instead of overthinking the architecture. The tools just made it possible to move fast enough that I didn't lose momentum and abandon it like every other time.

bytefish today at 6:20 PM

This resonates with my experience.

I have several Open Source projects and wanted to refactor them for a decade. A week ago I sat down with Google Gemini and completely refactored three of my libraries. It has been an amazing experience.

What’s a game changer for me is the feedback loop. I can quickly validate or invalidate ideas, and land at an API I would enjoy to use.

bvan today at 2:52 PM

This a very insightful post. Thanks for taking the time to share your experience. AI is incredibly powerful, but it’s no free-lunch.

The_Goonies1985 today at 3:23 PM

The author mentions a C codebase. Is AI good at coding in C now? If so, which AI systems lead in this language?

Ideally: local; offline.

Or do I have to wrestle it for 250 hours before it coughs up the dough? Last time I tried, the AI systems struggled with some of the most basic C code.

It seemed fine with Python, but then my cat can do that.

myultidevhq today at 3:22 PM

The 8-year wait is the part that stands out. Usually the question is "why start now" not "why did it take 8 years". Curious if there was a specific moment where the tools crossed a threshold for you, or if it was more gradual.

javierhonduco today at 5:45 PM

Great write-up. As a side note (not a Googler myself and this is 100% my opinion) Lalit’s team was hiring in London, UK. If you are interested in working in low level performance tools, this might be a very cool opportunity!

edfletcher_t137 today at 4:16 PM

> Of all the ways I used AI, research had by far the highest ratio of value delivered to time spent.

Seconded!

soursoup today at 5:48 PM

The author apparently skipped ai-assisted refactoring and auditing before moving to prod.

4b11b4 today at 3:08 PM

Great write-up with provenance

zer00eyz today at 3:31 PM

This article is describing a problem that is still two steps removed from where AI code becomes actually useful.

90 percent of the things users want either A) dont exist or B) are impossible to find, install and run without being deeply technical.

These things dont need to scale, they dont need to be well designed. They are for the most part targeted, single user, single purpose, artifacts. They are migration scripts between services, they are quick and dirty tools that make bad UI and workflows less manual and more managable.

These are the use cases I am seeing from people OUTSIDE the tech sphere adopt AI coding for. It is what "non techies" are using things like open claw for. I have people who in the past would have been told "No, I will not fix your computer" talk to me excitedly about running cron jobs.

Not everything needs to be snap on quality, the bulk of end users are going to be happy with harbor freight quality because it is better than NO tools at all.

senthilnayagam today at 4:56 PM

when he decided on rust, he could have looked up sqlite port, libsqlite does a pretty good job.

FpUser today at 6:16 PM

I do not have anything resembling problems described. Before I ask AI to create new code (except super trivial things). I first split application into smaller functional modules. I then design structure of the code down to main classes and methods and their interaction. Also try to keep scope small. Then AI just fills out the actual code. I have no problems reviewing it. Sometimes I discover some issues - like using arrays instead of maps leading to performance issues but it is easily spotted.

holoduke today at 5:14 PM

A key take away from this article is that you as a developer spending as much time on refactoring as on the actual feature. You are constantly requesting code reviews, architectural assessements, consolidations, extractions etc. only then you can empower AI to become a force multiplier. And prevent slop and spaghetti code to be created. Nice article

dsteel today at 6:27 PM

[dead]

TraceAgently today at 4:08 PM

[dead]

techpulselab today at 4:13 PM

[dead]

meidad_g today at 3:58 PM

[dead]

toniantunovi today at 5:01 PM

[dead]

aplomb1026 today at 5:32 PM

[dead]

deleted today at 6:58 PM

vlubeschanin today at 6:59 PM

[dead]

afron_manyu today at 5:00 PM

[dead]

Yash_Claw today at 8:33 PM

[dead]

alejandrosplitt today at 3:49 PM

[dead]

huflungdung today at 5:12 PM

[dead]

rlenf today at 2:20 PM

[flagged]

intensifier today at 3:54 PM

article looks like a tweet turned into 30 paragraphs. hardly any taste.

deleted today at 2:41 PM