from Hacker News

Code Llama, a state-of-the-art large language model for coding

by marcopicentini on 8/24/23, 1:26 PM with 501 comments

by daemonologist on 8/24/23, 5:34 PM

Works nearly out of the box with llama.cpp, which makes it easy to try locally: https://github.com/ggerganov/llama.cpp/issues/2766

Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):

    # prints the first ten prime numbers 
    def print_primes(): 
        i = 2 
        num_printed = 0 # end of prompt
        while num_printed < 10:
            if is_prime(i):
                print(i)
                num_printed += 1
            i += 1

    def is_prime(n):
        i = 2
        while i * i <= n:
            if n % i == 0:
                return False
            i += 1
        return True

    def main():
        print_primes()

    if __name__ == '__main__':
        main()

It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.

by redox99 on 8/24/23, 2:41 PM
The highlight IMO
> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.
Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.
by up6w6 on 8/24/23, 2:29 PM
Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot
https://ai.meta.com/blog/code-llama-large-language-model-cod...
by reacharavindh on 8/24/23, 2:05 PM
Code llama Python is very interesting. Specifically tuned for Python.
I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.
That would be a crazy future thing! Putting machines truly to work..
by Palmik on 8/24/23, 4:17 PM
The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.
[1] https://arxiv.org/pdf/2212.09689.pdf
by syntaxing on 8/24/23, 4:31 PM
TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.
[1] https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16
by jmorgan on 8/24/23, 5:02 PM
To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: https://github.com/jmorganca/ollama
```
   ollama run codellama "write a python function to add two numbers"
```
More models coming soon (completion, python and more parameter counts)
by benvolio on 8/24/23, 2:48 PM
>The Code Llama models provide stable generations with up to 100,000 tokens of context.
Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.
And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?
by lordnacho on 8/24/23, 3:54 PM
Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.
Is anyone working on a code AI that can suggest refactorings?
"You should pull these lines into a function, it's repetitive"
"You should change this structure so it is easier to use"
Etc
by Draiken on 8/24/23, 4:22 PM
As a complete noob at actually running these models, what kind of hardware are we talking here? Couldn't pick that up from the README.
I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.
by scriptsmith on 8/24/23, 3:05 PM
How are people using these local code models? I would much prefer using these in-context in an editor, but most of them seem to be deployed just in an instruction context. There's a lot of value to not having to context switch, or have a conversation.
I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?
by mymac on 8/24/23, 3:48 PM
Never before in the history of mankind was a group so absolutely besotted with the idea of putting themselves out of a job.
by modeless on 8/24/23, 2:24 PM
Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.
by ilaksh on 8/24/23, 2:33 PM
Between this, ideogram.ai (image generator which can spell, from former Google Imagen team member and others), and ChatGPT fine-tuning, this has been a truly epic week.
I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.
by WhitneyLand on 8/24/23, 9:01 PM
How much am I’m missing out on with tools like this or code pilot, compared to using GPT-4?
I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.
So far gpt-4 has seemed quite useful for generating code, reviewing code for certain problems, etc.
by 1024core on 8/24/23, 8:01 PM
If GPT-4's accuracy is 67% and this is 54%, how can these guys claim to be SOTA?
by gorbypark on 8/24/23, 7:28 PM
I can't wait for some models fine tuned on other languages. I'm not a Python developer, so I downloaded the 13B-instruct variant (4 bit quantized Q4_K_M) and it's pretty bad at doing javascript. I asked it to write me a basic React Native component that has a name prop and displays that name. Once it returned a regular React component, and when I asked it to make sure it uses React Native components, it said sure and outputted a bunch of random CSS and an HTML file that was initializing a React project.
It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.
by TheRealClay on 8/25/23, 1:27 AM
Anyone know of a docker image that provides an HTTP API interface to Llama? I'm looking for a super simple sort of 'drop-in' solution like that which I can add to my web stack, to enable LLM in my web app.
by KaiserPro on 8/24/23, 9:11 PM
This is great for asking questions like "how do I do x with y" and this code <<some code>> isn't working, whats wrong? Much faster that googling, or a great basis for forming a more accurate google search.
Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)
by natch on 8/24/23, 3:59 PM
Why wouldn’t they provide a hosted version? Seems like a no brainer… they have the money, the hardware, the bandwidth, the people to build support for it, and they could design the experience and gather more learning data about usage in the initial stages, while putting a dent in ChatGPT commercial prospects, and all while still letting others host and use it elsewhere. I don’t get it. Maybe it was just the fastest option?
by jasfi on 8/24/23, 2:56 PM
Now we need code quality benchmarks comparing this against GPT-4 and other contenders.
by ilaksh on 8/24/23, 2:43 PM
https://github.com/facebookresearch/codellama
by andrewjl on 8/24/23, 3:54 PM
What I found interesting in Meta's paper is the mention of HumanEval[1] and MBPP[2] as benchmarks for code quality. (Admittedly maybe they're well-known to those working in the field.)
I haven't yet read the whole paper (nor have I looked at the benchmark docs which might very well cover this) but curious how these are designed to avoid issues with overfitting. My thinking here is that canned algorithm type problems common in software engineering interviews are probably over represented in the training data used for these models. Which might point to artificially better performance by LLMs versus their performance on more domain-specific type tasks they might be used for in day-to-day work.
[1] https://github.com/openai/human-eval
[2] https://github.com/google-research/google-research/tree/mast...
by msoad on 8/24/23, 2:09 PM
Is there any place we can try those models? Are they available on HuggingFace?
by dangerwill on 8/24/23, 3:35 PM
It's really sad how everyone here is fawning over tech that will destroy you own livelihoods. "AI won't take your job, those who use AI will" is purely short term, myopic thinking. These tools are not aimed to help workers, the end goal is to make it so you don't need to be an engineer to build software, just let the project manager or director describe the system they want and boom there it is.
You can scream that this is progress all you want, and I'll grant you that these tools will greatly speed up the generation of code. But more code won't make any of these businesses provide better services to people, lower their prices, or pay workers more. They are just a means to keep money from flowing out of the hands of the C-Suite and investor classes.
If software engineering becomes a solved problem then fine, we probably shouldn't continue to get paid huge salaries to write it anymore, but please stop acting like this is a better future for any of us normal folks.
by MuffinFlavored on 8/24/23, 2:35 PM
Can I feed this entire GitHub projects (of reasonable size) and get non-hallucinated up-to-date API refactoring recommendations?
by e12e on 8/24/23, 5:00 PM
Curious if there are projects to enable working with these things self-hosted, tuned to a git repo as context on the cli, like a Unix filter - or with editors like vim? (I'd love to use this with Helix)
I see both vscode and netbeans have a concept of "inference URL" - are there any efforts like language server (lsp) - but for inference?
by pmarreck on 8/24/23, 9:16 PM
I want "safety" to be opt-in due to the inaccuracy it introduces. I don't want to pay that tax just because someone is afraid I can ask it how to make a bomb when I can just Google that and get pretty close to the same answer already, and I certainly don't care about being offended by its answers.
by robertnishihara on 8/26/23, 12:58 AM
If you want to try out Code Llama, you can query it on Anyscale Endpoints (this is an LLM inference API we're working on here at Anyscale).
https://app.endpoints.anyscale.com/
by brucethemoose2 on 8/24/23, 2:30 PM
Here is the paper:
https://ai.meta.com/research/publications/code-llama-open-fo...
by naillo on 8/24/23, 2:46 PM
Feels like we're like a year away from local LLMs that can debug code reliably (via being hooked into console error output as well) which will be quite the exciting day.
by braindead_in on 8/24/23, 2:56 PM
The 34b Python model is quite close to GPT4 on HumanEval pass@1. Small specialised models are catching up to GPT4 slowly. Why not train a 70b model though?
by awwaiid on 8/25/23, 4:35 AM
I want to see (more) code models trained on git diffs
by pelorat on 8/26/23, 11:17 AM
To bad most models focus on Python, it's not a popular language here in Europe (for anything).
by bick_nyers on 8/24/23, 3:26 PM
Anyone know of a good plugin for the JetBrains IDE ecosystem (namely, PyCharm) that is CoPilot but with a local LLM?
by dchuk on 8/25/23, 2:37 AM
Given this can produce code when prompted, could it also be used to interpret html from a crawler and then be used to scrape arbitrary URLs and extract structured attributes? Basically like MarkupLM but with massively more token context?
by 1024core on 8/24/23, 3:26 PM
> Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash
What?!? No Befunge[0], Brainfuck or Perl?!?
[0] https://en.wikipedia.org/wiki/Befunge
/just kidding, of course!
by jtwaleson on 8/24/23, 8:32 PM
This is probably a stupid question, but would it be possible to use these models to rate existing code and point to possible problems, rather than generating new code? That would be extremely useful to some use cases I'm working on.
by akulbe on 8/25/23, 12:27 AM
Random tangential question given this is about llama, but how do you get llama.cpp or kobold (or whatever tool you use) to make use of multiple GPUs if you don't have NVlink in place?
I got a bridge, but it was the wrong size.
Thanks, in advance.
by dontupvoteme on 8/24/23, 9:34 PM
Did people *really* think only artists would be losing their jobs to AI?
by gdcbe on 8/24/23, 4:40 PM
Is there somewhere docs to show you how to run this on your local machine and can you make it port it a script between languages? Gpt4 can do that pretty well but its context is too small for advanced purposes.
by ai_g0rl on 8/25/23, 1:05 AM
this is cool, https://labs.perplexity.ai/ has been my favorite way to play w these models so far
by RobKohr on 9/1/23, 9:56 PM
Now it just needs a vscode plugin to replace copilot.
by rafaelero on 8/24/23, 7:31 PM
Those charts remind me just how insanely good GPT-4 is. It's almost 5 months since its release and I am still at awe with its capabilities. The way it helps with coding is just crazy.
by mdaniel on 8/24/23, 3:40 PM
it looks like https://news.ycombinator.com/item?id=37248844 has gotten the traction at 295 points
by WaitWaitWha on 8/24/23, 5:02 PM
Can someone point me to a ELI5 sequence of steps that shows how someone can install and use LLMs locally and in some way, functionally?
Asking for purposes of educating non-technologists.
by eurekin on 8/24/23, 2:34 PM
theBloke cannot rest :)
by m00nsome on 8/25/23, 7:22 AM
Why do they not release the unnatural Variant of the model? According to the paper it beats all of the other variants and seems to be close to GPT-4.
by KingOfCoders on 8/25/23, 8:10 AM
Any performance tests? (e.G. tokens/s on a 4090?)
by born-jre on 8/24/23, 4:49 PM
34B is grouped query attention, right? Does that make it the smallest model with grouped attention?
I can see some people fine-tuning it again for general propose instruct.
by bryanlyon on 8/24/23, 4:27 PM
Llama is a very cool language model, it being used for coding was all but inevitable. I especially love it being released open for everyone.
I do wonder about how much use it'll get, seeing as running a heavy language model on local hardware is kinda unlikely for most developers. Not everyone is runnning a system powerful enough to equip big AIs like this. I also doubt that companies are going to set up large AIs for their devs. It's just a weird positioning.
by bracketslash on 8/24/23, 6:01 PM
So uhh…how does one go about using it?
by the-alchemist on 8/24/23, 3:21 PM
Anyone know if it supports Clojure?
by maccam912 on 8/24/23, 2:23 PM
It appears we do have a 34B version now, which never appeared for non fine tuned llama 2.
by marcopicentini on 8/24/23, 4:00 PM
It's just a matter of time that Microsoft will integrate it into VSCode.
by binary132 on 8/24/23, 3:04 PM
I wonder whether org-ai-mode could easily support this.
by jerrygoyal on 8/25/23, 5:46 AM
what is the cutoff knowledge of it? Also, what is the cheapest way to use it if I'm building a commercial tool on top of it?
by waitingkuo on 8/24/23, 1:35 PM
Looks like that we need to request the access first
by mercurialsolo on 8/24/23, 4:34 PM
Is there a version of this on replicate yet?
by Dowwie on 8/24/23, 3:14 PM
What did the fine tuning process consist of?
by gw67 on 8/24/23, 1:33 PM
In your opinion, Why Meta does this?
by praveenhm on 8/24/23, 5:54 PM
which is the best model for coding right now, GPT4/copilot/phind ?
by nothrowaways on 8/25/23, 1:17 AM
Kudos to the team at FB.
by likenesstheft on 8/24/23, 3:07 PM
no more work soon?
by jrh3 on 8/24/23, 9:48 PM
lol... Python for Dummies (TM)
by Someone1234 on 8/24/23, 8:01 PM
Business opportunity: I'd pay money for NICE desktop software that can run all these different models (non-subscription, "2-year updates included, then discount pricing" modal perhaps). My wishlist:
- Easy plug & play model installation, and trivial to change which model once installed.
- Runs a local web server, so I can interact with it via any browser
- Ability to feed a model a document or multiple documents and be able to ask questions about them (or build a database of some kind?).
- Absolute privacy guarantees. Nothing goes off-machine from my prompt/responses (USP over existing cloud/online ones). Routine license/update checks are fine though.
I'm not trying to throw shade at the existing ways to running LLMs locally, just saying there may be room for an OPTIONAL commercial piece of software in this space. Most of them are designed for academics to do academic things. I am talking about a turn-key piece of software for everyone else that can give you an "almost" ChatGPT or "almost" CoPilot-like experience for a one time fee that you can feed sensitive private information to.
by lolinder on 8/24/23, 2:23 PM
Does anyone have a good explanation for Meta's strategy with AI?
The only thing I've been able to think is they're trying to commoditize this new category before Microsoft and Google can lock it in, but where to from there? Is it just to block the others from a new revenue source, or do they have a longer game they're playing?
by rvnx on 8/24/23, 3:27 PM
Amazing! It's great that Meta is making AI progress.
In the meantime, we are still waiting for Google to show what they have (according to their research papers, they are beating others).
> User: Write a loop in Python that displays the top 10 prime numbers.
> Bard: Sorry I am just an AI, I can't help you with coding.
> User: How to ask confirmation before deleting a file ?
> Bard: To ask confirmation before deleting a file, just add -f to the rm command.
(real cases)
by 6stringmerc on 8/24/23, 2:47 PM
So it’s stubborn, stinks, bites and spits?
No thanks, going back to Winamp.