from Hacker News

Diff Models – A New Way to Edit Code

by sadiq on 1/28/23, 11:04 AM with 199 comments

  • by pavlov on 1/28/23, 12:00 PM

    Somehow these GitHub-trained ML code assistants sadden me.

    My idea of enjoyable high-quality programming isn’t to dip a spoon into an ocean of soup made of other people’s random design decisions and bugs accumulated over fifteen years, hoping to get a spoonful without hidden crunchy insect bits.

    I know the soup is nutritious and healthy 98% of the time, and eating it saves so much time compared to preparing a filet mignon myself. But it’s still brown sludge.

  • by RjQoLCOSwiIKfpm on 1/28/23, 12:22 PM

    Prepare for household appliances - washing machines etc. - doing strange things randomly.

    Prepare for the same thing with electronics which you didn't consider as containing much software before - central heating units, AC units, fridges, stoves, light switches, LED light bulbs, vacuum cleaners, electric shavers, electric toothbrushes, kids toys, microwave ovens, really anything which consumes electricity.

    Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

    Prepare for the support not understanding the random problems you encounter.

    Prepare for the answers you get from support being similarly random.

    And maybe, with an unknown probability, prepare for your house burning down and nobody can tell you why.

  • by Kwantuum on 1/28/23, 4:19 PM

    A lot of the comments seem to talk about the inevitable AI event horizon but unless I'm misreading this article the results are flat out bad. Even the 6 billion parameters model barely scratches a 50% success rate on a tiny problem that is trivial to fix for any human with basic knowledge of programming. Note the log scale of the graph.
  • by startupsfail on 1/28/23, 12:48 PM

    From the safety perspective (may get important soon), it is perhaps a very bad idea to allow easy execution/injection of arbitrary code into random places with little review.

    One of the first steps of a misaligned/unhelpful/virus type of a system, attempting to secure its presence would likely be inference/GPU/TPU compute access. And code injection is a vector. There are multiple other vectors.

    When designing such systems, please do keep that in mind. Make sure code changes are properly signed and the originating models are traceable.

    Same applies to datasets generated by models.

  • by jakear on 1/28/23, 2:51 PM

    Excellent. This is the beginning of the end for the cohort of people writing clear, descriptive commit messages. All your knowledge is soon to be acquisitioned and commodified by the Man with the GPU.

    I on the other hand will survive: what sense is an AI to make of such classic messages as David Bowie's excellent "ch-ch-changes!", the five "fix CI maybe???"s in a row, or the eternal "fuck this shit"?

  • by PoignardAzur on 1/28/23, 3:36 PM

    We're still in the beginning for these tools, but already they're demonstrating some really exciting capacity.

    Something I haven't seen explored too much: navigation help. One of the things that takes me the most time when coding is remembering what was the next file / module / function I need to edit and jumping to it.

    An autocomplete engine that would suggest jump locations instead of token could help me stay in the flow much longer, with fewer worries about whether I'm introducing subtle bugs because I'm relying on the AI too much.

  • by abhijeetpbodas on 1/28/23, 2:28 PM

    On a philosophical level, AI for writing code has always seemed redundant to me. Here's why:

    1. Humans create programming languages which machines can understand. OK.

    2. Humans build tools (LSP, treesitter, tags, type checkers and others) to help humans understand code better. OK.

    3. Humans build (AI) programs which run on machines so that the computer can understand... computer programs???

    Aren't computers supposed to be able to understand code already? Wasn't the concept of "computer code" created so as to have something which the computer could understand? Isn't making a (AI) program to help the computer understand computer programs re-inventing the wheel?

    (Of course, I get that I use the terms "understand" and "computer programs" very loosely here!)

  • by lettergram on 1/28/23, 2:21 PM

    I view programming as a trade. I’ve spent years honing my skills, I pass wisdom to junior engineers as I can. I review code and provide detailed alternatives.

    My concern with AI across all fields are that people won’t gain the fundamental skills necessary for moving the bounds of what’s possible. Certainly, tools like this AI could produce good results. However, the underlying human is still providing the training data. More importantly, humans are producing the trajectory of development.

    If humans are no longer capable of pushing the AI systems. Then the AI systems will either cease to improve, or the AI systems will learn to play off each other. In highly complex systems like many programs, I suspect they’ll play off each other and achieve local minimum/maximum locations. Ie because the “game” (program development) can be iterative they’ll constantly improve code. However, because the AI systems don’t interact with all data (particularly real-world data) when a customer shows a sad face at some UI/UX, it won’t completely develop a new feature that matches the desires of the customer.

    Where I fear this will leave us is a class of less-skilled engineers and overly optimized AI. Basically, stuck in development.

  • by DominikPeters on 1/28/23, 1:14 PM

    It would have been helpful to show some example generations of the model, unless I've missed them.
  • by ilaksh on 1/28/23, 5:40 PM

    Since I am building a website https://aidev.codes to do programming based on natural language descriptions, this is extremely relevant to me.

    OpenAI has an 'edit' endpoint but it's 'in beta' and limited to 10-20 requests per minute. They do not acknowledge support requests about this. Azure OpenAI also has this endpoint I think but they ignore me as well.

    So for my edits just like everything else I have been relying on text-davinci-003 since it has much more feasible rate limits. I have just been having it output the full new file but maybe this Unified Diff thing is possible to leverage.

    Does anyone know, what would be the easiest way to try to run their 6B Diff Models thing against my own prompts for my service? Maybe Hugging Face?

  • by moconnor on 1/28/23, 1:59 PM

    All that to end with “no meaningful improvement over the salesforce codegen model” is a bit disappointing.

    Negative results are interesting in their own right. I’d rather read about why this isn’t better at the 6B parameter level than e see a hand wave that, well, the samples are more diverse and look the 350M model is better.

  • by mortehu on 1/29/23, 2:14 AM

    I wrote the program between <BEF> and <MSG>, and it generated the following output:

      <NME> diff_model.py
      <BEF> import argparse
    
      import torch
      import transformers
    
      def main():
          argparser = argparse.ArgumentParser()
          argparser.add_argument('--checkpoint', default='CarperAI/diff-codegen-2b-v2', choices=['CarperAI/diff-codegen-6b-v2', 'CarperAI/diff-codegen-2b-v2', 'CarperAI/diff-codegen-350m-v2'], help='Model to use')
          args = argparser.parse_args()
    
          model = transformers.AutoModelForCausalLM.from_pretrained(args.checkpoint)
          tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)
    
          # Use CUDA.
          model = model.cuda()
    
          text = '<NME> diff_model.py\n<BEF> '
          text += open("diff_model.py").read()
          text += '<MSG> Print a message after loading the model\n<DFF>\n'
    
          completion = model.generate(tokenizer.encode(text, return_tensors="pt").cuda(), max_new_tokens=400)
    
          completion = tokenizer.decode(completion[0], skip_special_tokens=True)
    
          print(completion)
    
    
      if __name__ == '__main__':
          main()
      <MSG> Print a message after loading the model
      <DFF>
      <DFF> @@ -1,4 +1,5 @@
       '''
      +import sys
       import argparse
    
       import torch
      @@ -8,6 +9,7 @@ def main():
           tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)
    
           # Use CUDA.
      +    print("CUDA:", torch.cuda.is_available())
           model = model.cuda()
    
           text = '<NME> diff_model.py\n<BEF> '
      @@ -10,3 +12,4 @@ def main():
           completion = tokenizer.decode(completion[0], skip_special_tokens=True)
    
           print(completion)
      +    sys.exit(0)
      \ No newline at end of file
      [\ No newline at end of file repeated many times]
    
    It takes about 150 seconds to run on a 3090 Ti when the model is already on disk.
  • by Epa095 on 1/28/23, 6:53 PM

    Maybe this can give a boost for languages like idris or F*, where you can specify much stronger types than in normal languages (with the price that you might have too proove the types manually). The types can help "tame" the AI generated code, and the AI can help generate the proofs.

    I also wonder if it could be useful in creating Coq proofs!

  • by wslh on 1/28/23, 2:10 PM

    Very opportune. I am working on security diffs before and after security audit commits [1] reading the whole piece.

    [1] https://news.ycombinator.com/item?id=34360102

  • by parasti on 1/28/23, 1:48 PM

    I skimmed the post, but it seems not much was said about how the original diffs are generated. Git generates diffs only on request with varying levels of accuracy depending on the options given. Sometimes the diff completely fails to capture the intent of the change - it shows the path from A to B but not in any semantically meaningful way.
  • by ec109685 on 1/28/23, 8:23 PM

    2022: engineers with 3 jobs

    2023: engineers with their own AI model, typing “#fixed bugs” and spending the rest of the day by the pool.

  • by Jackson__ on 1/29/23, 12:01 AM

    I'm not sure if I'm just imagining it, but there seems to be a lot more negative push-back online to this than there was for copilot.

    It makes me wonder if it's related to recent protests in other creative fields in response to AI models, or just a weird dislike of openly released model weights?

  • by abdnafees on 1/28/23, 1:15 PM

    Why now? I mean it's been only 20 odd years or so since modern programming became popular. And, it's not a lot. Let people learn how to code, make mistakes and then learn from those mistakes. Pre-cooked meals are not as good as home cooked goodness.
  • by indeyets on 1/28/23, 11:49 AM

    So, it is loosely the same as copilot? I understand that approach is a tad different, but result of converting natural language descriptions into code-changes should be comparable.

    And both are trained on large corpus of github sources

    Is there a way to test it somehow? Public API maybe?

  • by pklausler on 1/29/23, 1:50 AM

    How good are these LLMs going to be at debugging code, as opposed to writing it?
  • by spapas82 on 1/28/23, 2:06 PM

    I'd really like to see how this would work with my commits... 99% of the messages on my commits are single word, similar to:

    - ok

    - fix

    - done

    - test

    - nice

  • by tbrownaw on 1/28/23, 4:34 PM

    Sounds like basically the inverse of what was on here the other day about automatically generating commit messages from a diff.

    Sounds kinda cool, even if trusting it would be a terrible idea.

  • by indeyets on 1/28/23, 12:04 PM

  • by leo2023 on 1/28/23, 6:06 PM

    The next idea after this could be: developers draw a system diagram of the architecture, then AI writes the whole system E2E, high performance, distributed.
  • by shul on 1/29/23, 12:21 PM

    Why all the hate? I for one welcome our AI overlords
  • by shireboy on 1/28/23, 1:11 PM

    If this thing is trained on my commit messages we’re all doomed. Or else we’ll be able to type “fixed the thing” and have a whole app written.