by gatinsama on 3/8/25, 6:17 PM with 97 comments
by jit_hacker on 3/8/25, 10:18 PM
I spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.
by mikeocool on 3/8/25, 8:01 PM
In my experience, AI is helpful for that first 90% — when the codebase is pretty simple, and all of the weird business logic edge cases haven’t crept in. In the last 10%(as well as most “legacy” codebases), it seems to have a lot trouble understanding enough to generate helpful output at more than a basic level.
Furthermore, if you’re not deliberate with your AI usage, it really gets you into “this code is too complicated for the AI to be much help with” territory a lot faster.
I’d imagine this is part of why we’re not seeing an explosion of software productivity.
by photonthug on 3/8/25, 11:02 PM
10x, 20x etc productivity boosts really should be easy to see. My favorite example of this is the idea of porting popular things like media wiki/wordpress to popular things like Django/rails. Charitable challenge right, since there’s lots of history / examples, and it’s more translation than invention. What about porting large well known code bases from c to rust, etc. Clearly people are interested in such things.
There would be a really really obvious uptick in interesting examples like this if impossible dreams were now suddenly weekend projects.
If you don’t have an example like this.. well another vibes coding anecdote about another CRUD app or a bash script with tricky awk is just not really what TFA is asking about. That is just evidence that LLMs have finally fixed search, which is great, but not the subject that we’re all the most curious about.
by KaiserPro on 3/8/25, 9:11 PM
For me its been a everso slight net positive.
In terms of in-IDE productivity it has improved a little bit. Stuff that is mostly repetitive can be autocompleted by the LLM. It can, in some cases provide function names from other files that traditional intelliCode can't do because of codebase size.
However it also hallucinates plausible shit, which significantly undermines the productivity gains above.
I suspect that if I ask it directly to create a function to do X, it might work better. rather than expecting it to work like autocomplete (even though I comment my code much more than my peers)
over all rating: for our code base, its not as good as c# intelliCode/VS code.
Where it is good is asking how I do some basic thing in $language that I have forgotten. Anything harder and it start going into bullshit land.
I think if you have more comprehensive tests it works better.
I have not had much success with agentic workflow, mainly because I've not been using the larger models. (Our internal agentic workflow is limited access)
by techpineapple on 3/8/25, 7:24 PM
by EliRivers on 3/8/25, 8:05 PM
An example from today was using XAudio2 on windows to output sound, where that sound was already being fetched as interleaved data from a network source. I could have read the docs, found some example code, and bashed it together in a few hours; but I asked one of the LLMs and it gave me some example code tuned to my request, giving me a head start on that.
I had to already know a lot of context to be able to ask it the right questions, I suspect, and to thence tune it with a few follow up questions.
by avastmick on 3/8/25, 9:17 PM
At first I was all in with Copilot and various similar plugins for neovim. It helped me get going but did produce the worst code in the application. Also I found (personal preference) that the autocomplete function actually slowed me down; it made me pause or even prevented me from seeing what I was doing rather than just typing out what I needed to. I stopped using any codegen for about four months at the end of 2024; I felt it was not making me more productive.
This year it’s back on the table with avante[0] and cursor (the latter back off the table due to the huge memory requirements). Then recently Claude Code dropped and I am currently feeling like I have productivity super powers. I’ve set it up in a pair programming style (old XP coder) where I write careful specs (prompts) and tests (which I code); it writes code; I review run the tests and commit. I work with it. I do not just let it just run as I have found I waste more time unwinding its output than watching each step.
From being pretty disillusioned six months ago I can now see it as a powerful tool.
Can it replace devs? In my opinion, some. Like all things it’s garbage in garbage out. So the idea a non-technical product manager can produce quality outputs seems unlikely to me.
by summarity on 3/8/25, 9:33 PM
by malux85 on 3/8/25, 8:40 PM
Some of the juniors I mentor cannot formulate their questions clearly and as a result, get a poor answer. They don’t understand that an LLM will answer the question you ask, which might not be the global best solution, it’s just answering your question - and if you ask the question poorly (or worse - the wrong question) you’re going to get bad results.
I have seen significant jumps in senior programmers capabilities, in some cases 20x, and when I see a junior or intermediate complaining about how useless LLM coding assistants are it always makes me very suspicious about the person, in that I think the problem is almost certainly their poor communication skills causing them to ask the wrong things.
by hirsin on 3/8/25, 8:46 PM
This has bad assumptions about what higher productivity looks like.
Other alternatives include:
1. Companies require fewer engineers, so there are layoffs. Software products are cheaper than before because the cost to build and maintain them is reduced.
2. Companies require fewer engineers so they lay them off and retain the spend, using it as stock buybacks or exec comp.
And certainly it feels like we've seen #2 out in the wild.
Assuming that the number of people working on software you use remains constant is not a good assumption.
(Personally this has been my finding. I'm able to get a bit more done in my day by eg writing a quick script to do something tedious. But not 5x more)
by barnabee on 3/8/25, 9:05 PM
Sometimes they give me maybe a 5–10% improvement (i.e. nice but not world changing). Usually that’s when they’re working as an alternative to docs, solving the odd bug, helping write tests or occasional glue code, etc. for a bigger or more complex/inportant solution.
In other cases I’ve literally built a small functioning app/tool in 6–12 hours of elapsed time, where most of that is spent waiting (all but unattended, so I guess this counts as “vibe coding”) while the LLM does its thing. It’s probably required less than an hour of my time in those cases and would easily have taken at least 1–2 days, if not more for me. So I’d say it’s at least sometimes comfortably 10x.
More to the point, in those cases I simply wouldn’t have tried to create the tool, knowing how long it’d take. It’s unclear what the cumulative incremental value of all these new tools and possibilities will be, but that’s also non-zero.
by throwawa14223 on 3/8/25, 9:13 PM
Copilot is very good at breaking my flow and all of the agent based systems I have tried have been disappointing at following incredibly simple instructions.
Coding is much easier and faster than writing instructions in English so it is hard to justify anything i have seen so far as a time saver.
by philjohn on 3/8/25, 11:08 PM
And when you name your test cases in a common pattern such as "MethodName_ExpectedBehavior_StateUnderTest" the LLM is able to figure it out about 80% of the time.
Then the other 20% of the time I'll make a couple of corrections, but it's definitely sped me up by a low double digit percentage ... when writing tests.
When writing code, it seems to get in the way more often than not, so I mostly don't use it - but then again, a lot of what I'm doing isn't boilerplate CRUD code.
by havaloc on 3/8/25, 10:14 PM
Writing a new view used to take 5-10 minutes but now I can do it in 30 seconds. Since it's the most basic PHP/MySql imaginable it works very well, none of those frameworks to confuse the LLM or suck up the context window.
The point is I guess that I can do it the old fashioned way because I know how, but I don't have to, I can tell ChatGPT exactly what I want, and how I want it.
by lfsh on 3/8/25, 9:21 PM
For example a peace of code with a foreach loop that uses the collection name inside the loop instead of the item name.
Or a very nice looking peace of code but with a method call that does not exist in the used library.
I think the weakness of AI/LMMs is that it outputs probabilities. If the code you request is very common than it will probably generate good code. But that's about it. It can not reason about code (it maybe can 'reason' about the probability of the generated answer).
by dvh on 3/8/25, 8:07 PM
The moment I realized llm are better was when I needed to do something with screen coordinates of point clouds in three.js and my searches lead nowhere, doing it myself would take me 1 or 2 hours, the llm got correct working code on first try.
by floppiplopp on 3/11/25, 8:55 AM
by patrick451 on 3/8/25, 8:27 PM
I have found them pretty helpful writing sql. But I don't really know sql very well and I'd imagine that somebody who does could write what I need in far less time that it takes me with the LLM. While the LLM helps finish my sql task faster, the downside is that I'm not really learning it in the same way I would if I had to actually bang my head against the wall and understand the docs. In the long run, I'd be better off without it.
by arjie on 3/8/25, 11:24 PM
by LunaSea on 3/8/25, 9:00 PM
by arthurofbabylon on 3/8/25, 8:39 PM
The nice thing about traction is that you can see it. When you shovel away the snow in your driveway you can move your car; that's nice. When you update your hot water kettle and it boils in 30 seconds, that's traction. Traction is a freshly installed dishwasher in your kitchen.
I sincerely ask – not because I am skeptical but because I am curious – where is the traction with LLMs in software?
by nurettin on 3/9/25, 6:37 AM
by insane_dreamer on 3/8/25, 11:36 PM
It boosts productivity in the way that a good IDE boost productivity, but nothing like 5x or even 2x. Maybe 1.2~1.4x.
by sumoboy on 3/8/25, 8:36 PM
by jmchuster on 3/8/25, 8:52 PM
But we've now lived it so much that it sounds ridiculous to try to argue that the internet doesn't really make _that_ much of a difference.
by rqtwteye on 3/8/25, 8:13 PM
But I can easily see a not so distant future where you don't even have to look at the code anymore and just let AI do its thing. Similar to us not checking the assembly instructions of compiled code.
by jasonthorsness on 3/8/25, 8:47 PM
by bhouston on 3/8/25, 11:39 PM
by furstenheim on 3/8/25, 9:08 PM
by mentalgear on 3/8/25, 7:52 PM
At least for mid- to high- complex projects.
Vibe coding might be fun but ultimately results in unmaintainable code.
by yimby2001 on 3/9/25, 2:18 PM
by herbst on 3/9/25, 9:51 AM
by asdf6969 on 3/9/25, 12:52 AM
by janwillemb on 3/8/25, 9:38 PM
by bufordtwain on 3/8/25, 10:09 PM
by croes on 3/9/25, 12:53 AM
Seems to be the easiest measurement of any effect
by zellyn on 3/8/25, 11:25 PM
Most of the folks I've talked to about it have been trying it, but the majority of the stories are still ultimately failures.
There are exceptions though: there's been some success for porting things between say JUnit4 and JUnit5.
The successes do seem to be coming more frequently, as the models improve, as the tools improve, as people develop the intuition, and as we invest time and attention to building out LLM-tailored documentation (my prediction here is that the task-focused, bite-sized documentation style that seems like a fit for LLMs will ultimately prove to be more useful to developers than a lot of the existing docs!)
On the part of the models improving, I expect it's going to be a bit like the ChatGPT 3.5 to 4 transition: there are certain almost intangible thresholds that when crossed can suddenly make a qualitative difference in usability and success.
I definitely feel like my emotions regarding LLMs are a bit of a roller coaster. I'm turning 50 these days, and some days feel like I would rather not completely upend my development practices! And the hype -- oh god, the hype -- is absolutely nauseating. Every CTO in existence told their teams to go rub some AI on everything. Every ad is telling you their AI is already working perfectly (it isn't).
But then I compete in our internal CTF and come in third even though the rest of my team bailed, because ChatGPT can write RISCV assembly payloads, and I don't have to learn or re-learns it for half an hour. Or I get Claude to write a Javascript/SVG spline editor matching the diagramming-as-code system I'm using, in like 30 or 45 minutes. And it's miraculous. And things like Cursor for just writing your code when you already know what you want… magical.
Here's the thing though. Despite the nauseating hype, we have to keep trying and trying to use AI well. There's a there there. The things are obviously unbelievably powerful. At some point, we're going to figure out how to use them effectively, and they're going to get good enough to do most of our low- to medium-complexity coding (at the very least). We're going to have to re-architect our software around AI. (As an example, instead of kludgy multi-platform solutions like React Native or Kotlin Multiplatform or J2ObjC, etc., why not make all the tests textual, and have an LLM translate changes in your Kotlin Android codebase into Swift automatically?)
We do still need sanity. I'm sort of half tongue-in-cheek trying to promulgate a rule that nobody can invoke AI to discount costs of expensive migrations until they've demonstrated an automated migration of the type in question with a medium-complexity codebase. We have to avoid waving our hands and saying, "Don't worry about that; AI will handle it," when it can't actually handle it yet.
But keep trying!
by smusamashah on 3/9/25, 5:49 AM
I use these tools to get help here and there with tiny code snippets. So far I have not been suggested anything finely optimised. I guess it's because a greater chunk they were trained on isn't optimised for performance.
Does anyone know if any current LLMs can generate super optimised code (even assembly language) ? I don't think so. Doesn't feel we are going to have more intelligent machines than us in future if they full of slop.
by dehrmann on 3/8/25, 9:06 PM
I suspect the metrics you sometimes hear like "x% of new code was written by an LLM" are being oversold because they're reported by people interested in juicing the numbers, so they count boilerplate, lines IDE autocomplete would have figured out, and lines that had to be fixed.
by semanticjudo on 3/8/25, 8:47 PM
by ilrwbwrkhv on 3/8/25, 9:31 PM
That stuff is already a mess so the AI slop that comes out is also messy and that's fine as long as it looks good and performs well. and does what I want and it's also really trivial to change.
However, I'm not letting it come near any backend code or actual development.
by simonswords82 on 3/8/25, 8:37 PM