by neural_thing on 3/12/24, 2:15 PM with 553 comments
by steve_adams_86 on 3/12/24, 4:32 PM
Even so, I realize the demos are still broad in scope and the results are incredible. Imagine seeing this even 2 years ago. It would seem like magic; you wouldn't be able to believe it. Today, this was inevitable and entirely believable. There will be even better versions of this soon.
by nsypteras on 3/12/24, 5:04 PM
by ThalesX on 3/12/24, 4:53 PM
Just yesterday I tried to feed it a simple HTML page to extract a selector, I tried it with GPT-4-turbo, I tried it with Claude, I tried it with Groq, I tried it with a local LLama2 model with 128k context window. None of them worked. This is a task that while annoying, I do in about 10 seconds.
Sure, I'm open to the possibility that in the next 2 - 3 days up to a couple of years, I'll no longer do manual coding. But honestly. After so much hype, I'm starting to grow a bit irritated with the hype.
Just give me a product that works as advertised and I'll throw money your way because I have a lot more ideas than I have code throughoutput!
by the_newest on 3/12/24, 4:45 PM
It makes me question the truthfulness of the other claims.
by YeGoblynQueenne on 3/12/24, 9:17 PM
They'd better have really advanced reasoning and planning capabilities way beyond everything that anyone else knows how to do with LLMs. There's a growing body of literature that leaves no doubt that LLMs can't reason and can't plan.
For a quick summary of some such results see:
by pushedx on 3/12/24, 7:08 PM
He is one of a very small group of people (going back to 1989) to get a perfect raw score at the IoI, the olympiad for competitive programming.
https://stats.ioinformatics.org/people/2686
Glad to see that he's putting his (unbelievable) talents to use. To give you a sense, at the event where I met him, he solved 6 problems equivalent to Leetcode medium-to-hard problems in under 15 minutes (total), including reading the problems, implementing input parsing, debugging, and submitting the solutions.
by PodgieTar on 3/12/24, 8:36 PM
If anyone is practicing for their B1 Dutch exam, feel free to use this link to get the practice paper.
https://usacognition--serve-s3-files.modal.run/attachments/4...
by dakiol on 3/12/24, 7:26 PM
- deobfuscate complex requirements into well divided chunks
- find gaps or holes in requirements so that I have to write the minimal amount of code
- understand codebases so that the implementation fits nicely
I don't need an "AI software engineer", I need an "AI people person who gives me well defined tasks". Now sure, if you combine those two kinds of AIs I could perhaps become irrelevant.
by mlsu on 3/13/24, 1:41 AM
I'm curious what a large, mature codebase, with complex internals and legacy code looks like after you sick devin on it. Not pretty I suspect. In fact, I think it will become so difficult to fix that nobody -- neither human nor devin -- will be able to clean up the mess. By sheer volume, a broken ball of unfixable spaghetti.
I would be immensely pissed off if someone did this to an open source project of mine, or even to a closed-source codebase I'm working on. Not only would it not be useful, it would be moving backwards. Creating an icky vomit mess that we will probably have to spend years cleaning up after bug reports and complaints from customers begin mounting, and competitors can iterate faster.
Does that sound like something you want to deal with in your software business?
by RyEgswuCsn on 3/12/24, 8:19 PM
If you can tell if a solution is correct or not --- well, then you don't need to have AI write it for you.
I think AI programming can only work when the industry begin to treat "almost working" systems backed by human customer service as acceptable.
by Oras on 3/12/24, 7:34 PM
> When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.
While it is a progress, its far away from being useful to be a software engineer.
by goat_whisperer on 3/12/24, 5:43 PM
"cars replaced horse drawn carriages. But we managed to adapt to that, the carriage drivers got new jobs."
My dudes. We are the HORSES being replaced in this scenario.
by pankajdoharey on 3/14/24, 7:33 AM
by aster0id on 3/12/24, 7:09 PM
by devinegan on 3/12/24, 7:02 PM
by HarHarVeryFunny on 3/12/24, 7:28 PM
Sure, one day we'll have AGI, and one day AGI will replace many jobs that can be done in front of a computer.
In the meantime, SOTA AI appears to be an airline chatbot that gets the company sued for lying to the customer. This is just basic question answering, and it can't even get that right. Would you trust it to write the autopilot code to fly the airplane? Maybe to write a tiny bit of it - just code up one function, perhaps?
I sure as hell wouldn't, and when it can be trusted to write one function that meets requirements and has no bugs, it's still going to be a LONG way before it can replace the job of the developers who were given a task of "write us an autopilot".
by singularity2001 on 3/12/24, 7:49 PM
I wonder how much time of this was consumed by manually directing Devin into the right direction, manually fixing and undoing the mess Devin produced and watching Devin burn through $$$. As others said, being completely non-transparent about this burns a bit of trust, but I'd really like to know where we are right now. Since Devin is currently "invite only demos", a more realistic peek into the state of the art can be seen here: https://docs.sweep.dev/blogs/gpt-4-modification
My gut feeling (and limited experience): gpt-4 and other models are not quite there yet, but whoever prepares for the next generation of models now will eventually win big times. Or be replaced by simpler approaches.
by StickyRibbs on 3/14/24, 8:22 AM
No engineering company worth their weight is going to build a world class technology business purely with generative AI in its current state. The risk in doing so currently is total and utter failure. I have a very hard time believing we're any where near that capability. Maybe your mom and pop startup could hire a prompt engineer to build a website and simple tool but we have yet to see those exercises surfaced to the mainstream; it's purely speculative.
I say, rest easy programmers. Your careers will be enriched more than axed with generative AI as a support tool for many years to come.
Also, if anyone who works in this field has a strong opposing belief, then consider OpenAI engineers are programming themselves out of a job which obviously is not the case.
by mellosouls on 3/12/24, 5:15 PM
https://news.ycombinator.com/item?id=36987454
Sweep is an open-source AI-powered junior developer
by mattlondon on 3/12/24, 4:26 PM
Instead it will mean that bosses can fire 75-90% of the (very expensive) engineers, with the ones who remain left to prompt the AI and clean up any mistakes/misunderstandings.
I guess this is the future. We've coded ourselves out of a job. People are smiling and celebrating this all - personally I find it kinda sad that we've basically put an end to software engineering as a career and put loads of people out of work. it is not just SWEs - it is impacting a lot of careers... I hope these researchers can sleep well at night because they're dooming huge swathes of people to unemployment.
Are we about to enter a software engineering winter? People will find new careers, no kids will learn to code since AI can do it all. We'll end up with a load of AI researchers being "the new SWEs", but relying on AI to implement everything? Maybe that will work and we'll have a virtuous circle of AIs making AI improvements and we'll never need engineers again? Or maybe we'll hit a wall and progress in comp sci will essentially stop?
by senko on 3/12/24, 5:38 PM
Currently, mainstream AI usage in coding is at the level of assistants and glorified autocomplete. Which is great (I use GitHub Copilot daily), but for us working in the space it's obvious that the impact will be much larger. Besides us (Pythagora), there's also Sweep (mentioned by others in the comments) and GPT Engineer who are tackling the same problem, I believe each with a slightly different angle.
Our thesis is that human in the loop is key. In coding, you can think of LLMs as a very eager junior developer who can easily read StackOverflow but doesn't really think twice before jumping to implementation. With guidance (a LOT in terms of internal prompts, and some by human) it can achieve spectacular results.
by devinprater on 3/12/24, 4:39 PM
by lacoolj on 3/12/24, 4:42 PM
by jasfi on 3/12/24, 4:49 PM
The real world eval benchmark puts Claude 2 way ahead of GPT-4, which doesn't sound right.
by matthewsinclair on 3/12/24, 11:57 PM
I remain open minded about what’s next and at the rate things are changing, I wouldn’t rule anything out a priori for now.
by pedalpete on 3/12/24, 10:15 PM
When I was using chatGPT to help guide me through some coding tasks, I'd find it could create somewhat useful code, but where it fell down was that it would put things into variables which would be better put into a class. It is this structuring of a complete system which is important for any real software engineering, rather than just writing code.
by hiddencost on 3/12/24, 4:32 PM
Making false, grandiose claims like that burns a lot of trust.
Focus on execution and quality.
by DrAgOn200233 on 3/13/24, 7:41 AM
In addition, when using GPT-4, I use it only when I have new thoughts, so the GPU occupancy rate is low. I probably use less than 5 hours of GPU time each month. DEVIN is sort of like an intern working for you, so you would probably at least make it work 40hrs/week.
These difference in GPU usage would probably make DEVIN 10 times more expensive for the business model to be profitable, that is, if they are using the subscriber business model like GPT-4.
I don't think there are any other viable business model for DEVIN - for sure it cannot replace or even reduce the number of human programmer due to LLM's unreliable nature and the necessity of code verification.
by Havoc on 3/13/24, 12:53 AM
Sure it is no senior architect but the trajectory is insane. Wasn’t that long ago that LLMs barely managed coherent poems. Now it’s troubleshooting code problems on its own?
Sure it’s just a gpt4 wrapper but that implies the same can be done with gpt5 and six etc.
Project it forward and that does actually become non trivial
by epolanski on 3/12/24, 7:18 PM
Just let me try the goddamn product.
By the time you let me in, I don't care anymore or another competitor catched my attention already.
Neon, the Postgres as a service put me in such a long wait list that by the time they invited me in, I was already on a completely different solution (and was happy).
by devinthenai on 3/13/24, 1:00 AM
by ein0p on 3/12/24, 8:44 PM
by gerash on 3/12/24, 7:01 PM
by LZ_Khan on 3/12/24, 9:48 PM
Side note: I'm kind of offended that something called 'Devin' is going to take my job. If you're going to replace me at least let me keep my dignity by naming it something cool like 'Sora'
by rafadc on 3/12/24, 4:30 PM
by bachittle on 3/12/24, 7:37 PM
by ramoz on 3/12/24, 8:43 PM
by swax on 3/12/24, 10:18 PM
https://www.youtube.com/watch?v=dHlv7Jl3SFI
The real problem is coherence (logic and consistency over time) which is what these wrappers try to address. I believe AI could probably be trained to be a lot more coherent out of the box.. working with minimal wrapping.. that is the AI I worry about.
by MichaelRazum on 3/12/24, 5:16 PM
by playmkr on 3/13/24, 3:52 AM
ok
by huimang on 3/12/24, 7:25 PM
I have tried using GPT4 & gemini extensively, and the amount of bullshit generated makes it unreliable if you don't already know the domain. These tools lack the critical stuff (being context-aware), and just make up libraries and APIs. Yet you can't be sure when it's bullshitting or not, making it an exercise in frustration for anything that's not trivial.
Save your money and buy an o'reilly subscription.
by isodev on 3/13/24, 6:36 AM
by Bjorkbat on 3/12/24, 4:21 PM
Namely, the client was asking for an unusually specific (for Upwork) ask. It was an almost perfect example of a job to be given to an AI agent for testing purposes.
by crucialfelix on 3/12/24, 7:25 PM
On a one by one basis I can use VSCode github copilot to rewrite each one the way I want it.
What I want to do is iterate through all functions in the files and do each one of them.
I know we are getting there, but does anybody know how that can be done right now?
by dukeyukey on 3/12/24, 4:52 PM
But a software engineer absolutely can buy access to AI services.
I have no idea how this will end up, but it'll be different to before.
by m3kw9 on 3/12/24, 7:49 PM
by meindnoch on 3/12/24, 8:04 PM
by adabaed on 3/12/24, 11:38 PM
by joeevans1000 on 3/13/24, 11:09 PM
So many comments about how insufficient the tool is.
Our heads are really in the sand, I'm afraid.
by pjmorris on 3/12/24, 4:47 PM
Devin may need some additional help for awhile.
by syedmsawaid on 3/12/24, 7:18 PM
by hackerlight on 3/12/24, 4:29 PM
by erickmunene on 3/13/24, 8:32 AM
As someone passionate about the potential of AI in tech, I can't wait to see what amazing feats Devin will accomplish. And who knows, maybe one day, companies like Munesoft Technologies will reach similar heights with their own AI-driven advancements. Here's to a future filled with endless possibilities! #DevinAI
by zoomin on 3/14/24, 4:57 PM
by paradite on 3/12/24, 7:37 PM
It's not 100% automated but saves a lot of time spent on writing code.
It works by composing prompts from tasks instructions, source code context and formatting instructions, resulting in high quality prompts that can be fed into LLMs to generate high quality code.
by symlinkk on 3/12/24, 9:19 PM
by cdeutsch on 3/20/24, 8:32 PM
by plinkplink on 3/15/24, 6:45 PM
Business owners have been getting sexually aroused at the prospect of taking a developer's salary and putting it in their own pockets for decades. Each iteration of this wet dream has only locked businesses into "low code" systems that require even more, highly-specialized developers to operate. Right now, and probably for a while, Devon, et al. is on par with the drag-and-drop automagical app building snake oil stuff.
LLMs are useful to help developers be more productive, which does translate to lay-offs, but until someone creates an AI that can translate the absolute fevered gibberish that comes out of business people's heads into a profitable piece of software, this is just MS FrontPage v100.0.
Just like an entire industry sprang up around fixing WordPress websites that business owners thought they could do themselves, pretty soon we'll start seeing job postings for AI-Generated Spaghetti Unravellers.
I'm (half seriously) imagining a future were software engineers are mostly consultants that show up and talk with business folks, then talk with the local robot, and get the project to actually work. Bill $1k per hour.
by heldrida on 3/13/24, 3:47 PM
by ellis0n on 3/12/24, 8:26 PM
by devinthenai on 3/13/24, 12:42 AM
https://docs.google.com/document/d/1byJgu1G_M58QVWmpZeEDthyA...
by andythedev on 3/13/24, 9:57 PM
by shreshth398495 on 3/13/24, 2:02 PM
by __lbracket__ on 3/13/24, 1:26 AM
by datavirtue on 3/12/24, 4:26 PM
by asasasa123 on 3/13/24, 12:36 PM
by globular-toast on 3/12/24, 8:04 PM
by cxmcc on 3/12/24, 8:40 PM
by cvhashim04 on 3/12/24, 4:30 PM
by emawa on 3/13/24, 1:09 PM
by xzfyes2 on 3/20/24, 4:33 AM
by cp9 on 3/12/24, 5:34 PM
by gnarcoregrizz on 3/12/24, 6:59 PM
UBI is a pipe dream... it's not happening. The wealth and means of production won't be shared in any meaningful capacity. Wealth inequality can get a whole lot worse.
by MSFT_Edging on 3/12/24, 4:50 PM
For every technological advancement, artisans are the first to be made obsolete.
Sure we have landfills full of unworn textiles, the market says its good, but overall, we keep destroying what allows humans to seek meaning.
Our governments and society have made it clear, if you don't produce value, you don't deserve dignity.
We have outsourced art to computers, so people who don't understand art can have it at their fingertips.
Now we're outsourcing engineering so those who don't understand it can have it done for cheap.
We hear stories of those who don't understand therapy suggesting AI can be a therapist, of those who don't understand medicine suggesting AI can replace a doctor.
What will be left? Where will we be? Just stranded without dignity or purpose, left to rot when we no longer produce value.
I ask this question often, with multiple contexts, but to what end? Who benefits from these advancements? The CEO and shareholders, sure, but just because something can be found for cheaper, doesn't mean it improves lives. Our clothes barely last a year, our shoes fall apart. Our devices come with pre-destined expiration dates.
Where will we be in the future? Those born into money can continue passing it around, a cargo cult for the numbers going up. But what about everyone else?
by xyst on 3/12/24, 6:30 PM
by rohandakua on 3/13/24, 10:01 AM
by ij09j901023123 on 3/12/24, 5:52 PM
by ridruejo on 3/12/24, 2:59 PM
by preommr on 3/12/24, 7:34 PM
Currently these models don't provide an adequate enough confidence measure that prevents them from maximizing their potential. In the next few years we're going to reach a point where models will be able to tell if something is possible and avoid hallucinating, guaranteeing much better correctness. Something like that would be absolutely killer.
If you add on a top-down approach using a framework, such that it can architect a system down into small individual components, then that's a recipe for a really great workflow. The models we have now really shine in doing automated unit tests, and small bits of code to avoid limits with context size. Making the interfaces obvious enough, and being able to glue things together using obvious connections seems very possible.
I really do think that in the next few years we're going to see one of these tools really do well.