from Hacker News

Vibe code isn't meant to be reviewed

by Firfi on 6/9/25, 3:34 PM with 26 comments

  • by ufo on 6/9/25, 3:52 PM

    This reminds me of the old idea that you could get some architecture astronauts to specify a system and then offload the implementation of the individual modules to the cheapest developers one could find. Which didn't always work well in practice...
  • by dtagames on 6/9/25, 3:46 PM

    Great stuff here. I've also found that doing all the architecture and interface work manually, then having Cursor write plans that follow that architecture to implement specific features is an ideal place to separate the human from the vibe coding.

    Then, it's easy to revise the plan itself or have Cursor do that, then re-run it to make individual implementation changes or fixes that don't affect your architecture or invent new interfaces.

  • by jasonthorsness on 6/9/25, 4:19 PM

    There's a lot of precedence for excluding "generated" code from review and linting but it's a bit weird when that is a large portion of the application and you can't rely on the correctness (non-LLM generated code is typically very mechanically generated so you just need to ensure the inputs are correct, not outputs).

    I think testing and reviewing LLM-generated code remains just as important. Hopefully they will get better, and it will be easier (and hopefully LLMs can also assist with reviews).

  • by rco8786 on 6/9/25, 4:14 PM

    There's just so much that feels "off" here, it's hard to put my finger on any particular part of it.

    Fundamentally, we acknowledge that AI writes crappy code and there's no visible path forward to really change that. I realize that it's getting incrementally better against various benchmarks, but would need a large step function change in code quality/accuracy to be considered "good" at software engineering.

    I get the appeal of trying to provide it stricter guardrails. And I'm willing to bet that the overall quality of the system built this way is better than one that is just 100% vibe coded.

    But that also implies a spectrum of quality between human code and vibe code..where the closer you get to human code the higher the quality, and vice versa. The author says this as well. But is there really an acceptable quality bar that can be reached with a significant % of the codebase being vibe coded? I'm pretty skeptical (speaking as some who uses AI tools all the time).

    > “Does it work? Does it pass tests? Doesn’t it sneak around the overseer package requirements? Does it look safe enough? Ship it.”

    If this type of code review were sufficient, we would already be doing it for human code. Wouldn't we?

    > The business logic is in the interface packages - here’s exactly how it works. The implementation details are auto-generated, but the core logic is solid.

    I don't understand how to separate "business logic" from "implementation details". These things form a venn diagram, but the author seems to treat them as exclusive.

  • by drivingmenuts on 6/9/25, 5:20 PM

    How do you trust the generated vibe code? With human coding and review, you know what's in the code and what it's doing behind your back. How does that work with vibe code?

    It seems like something that should NEVER be trusted - you don't know the source of the original code inhaled by the AI and the AI doesn't actually understand what it's taking in. Seems like a recipe for disaster.

  • by iwontberude on 6/9/25, 3:57 PM

    This is certainly not engineering. Maybe I’m finally a curmudgeon after years of chasing everything in this industry and getting burned enough times but vibe coding without manual review and testing is antithetical to contemporary study on software engineering and development. Are we really gluttons for punishment like this? Like writing a novel on a Nokia 6630, it’s a fun oddity but not really a scalable way to create value.
  • by deadbabe on 6/9/25, 3:57 PM

    Be honest: do people vibe code because they simply can’t imagine all the complexity and details of what they’re trying to achieve, or is it simply because it’s faster than typing it all out?

    If it’s the latter, perhaps it’s a sign that we are making languages too verbose, and there’s a lot of boilerplate patterns that could be cut down if we give ourselves wider vocabulary (syntax) to express concepts.

    In the end, if we can come up with a language that is 1 to 1 with the time and effort spent to write equivalent prompts, there will be no need for vibe coding anymore unless you really don’t know what you’re doing, in which case you should develop your skills or simply not be a software engineer. Some may say this language already exists.

  • by jillesvangurp on 6/9/25, 5:22 PM

    It's not that different from managing a few junior developers and getting them to do stuff for you. Sometimes doing it yourself is faster but letting them do it is a good investment because it makes them better prepared for the next time you want something from them. That's how they become senior developers.

    With AIs/vibe coding/whatever you want to call it, there is no such benefit. It's more an opportunistic thing. You can delegate or do it yourself. If delegating is overall faster and better, it's an easy choice. Otherwise, it's your own time you are wasting.

    Using this stuff (like everybody else) over the last two years has definitely planted the thought that I need to start thinking in terms of having LLM friendly code bases. It seems I get a lot better results when things are modular, well documented, and not too ambiguous. Of course that's what makes code bases nice to work with in general so these are not bad goals to have.

    Working with large code bases is hard and expensive (more tokens) and create room for ambiguity. So, break it up. Modularize. Apply those SOLID principles. Or get your agentic coding tool of choice to refactor things for you. No need to do that yourself. All you need to do is nudge things in the right direction. And that would be a good idea without AIs anyway. So, all this stuff does is remove excuses for you to not have better code.

    If you only vibe code and don't care, you just create a big mess that then needs cleaning up. Or for somebody else to clean up because what's your added value at that point? Your vibes aren't that valuable. Working software is. The difference between a product that makes money and a vibe coded thing that you look at and than discard is that one pays the bills and the other one is just for your entertainment.

  • by rich_sasha on 6/9/25, 4:05 PM

    Did they dogfood their own approach and have AI fill in the boring bits between headlines? I get that "vibe" from this article...
  • by Firfi on 6/9/25, 3:34 PM

    After some vibe coding frustrations, ups and downs, I found that splitting the code explicitly into well-curated, domain-heavy guidance code and code marked “slop” can solve a lot of frustration and inefficiency.

    We can be honest in our PR, “yes, this is slop,” while being technical and picky about code that actually matters.

    The “guidance” code is not only great for preserving knowledge and aiding the discovery process, but it is very strong at creating a system of “checks and balances” for your AI slops to conform to, which greatly boosts vibe quality.

    Helps me both technically (at least I feel so) with guiding claude code to do exactly what I want (or what we agreed to!) and psychologically because there's no detachment from the knowledge of the system anymore.

  • by starkparker on 6/9/25, 4:01 PM

    This sets up a pointless strawman about reviews for the headline that distracts so much from the point that I only caught it on second read:

    Restrict agentic workflows to implementation details, hand-write the higher-level logic and critical tests, and only pay attention to whether those human-written tests pass or fail. Then you don't have to worry about reviewing agent-generated code as long as the human-written tests about the functionality pass.

    (Still not sure I agree, not least of which for security and performance reasons at existing orgs; this assumes very good test coverage and design exist before any code is written. Interesting for greenfield projects though.)