from Hacker News

A brief history of LLaMA models

by andrewon on 4/28/23, 2:26 AM with 84 comments

by jiggawatts on 4/29/23, 10:19 PM
It keeps saying the phrase “model you can run locally”, but despite days of trying, I failed to compile any of the GitHub repos associated with these models.
None of the Python dependencies are strongly versioned, and “something” happened to the CUDA compatibility of one of them about a month ago. The original developers “got lucky” but now nobody else can compile this stuff.
After years of using only C# and Rust, both of which have sane package managers with semantic versioning, lock files, reproducible builds, and even SHA checksums the Python package ecosystem looks ridiculously immature and even childish.
Seriously, can anyone here build a docker image for running these models on CUDA? I think right now it’s borderline impossible, but I’d be happy to be corrected…
by doodlesdev on 4/29/23, 10:48 PM
```
   > Our system thinks you might be a robot!
   We're really sorry about this, but it's getting harder and harder to tell the difference between humans and bots these days.
```
Yeah, fuck you too. Come on, really, why put this in front of a _blog post_? Is it that hard to keep up with the bot requests when serving a static page?
by vessenes on 4/29/23, 10:35 PM
Most places that recommend llama.cpp for mac fail to mention https://github.com/jankais3r/LLaMA_MPS, which runs unquantized 7b and 13b models on the M1/M2 GPU directly. It's slightly slower, (not a lot), and significantly lower energy usage. To me the win not having to quantize while not melting a hole in my lap is huge; I wish more people knew about it.
by simonw on 4/29/23, 8:15 PM
I'm running Vicuna (a LLaMA variant) on my iPhone right now. https://twitter.com/simonw/status/1652358994214928384
The same team that built that iPhone app - MLC - also got Vicuna running directly in a web browser using Web GPU: https://simonwillison.net/2023/Apr/16/web-llm/
by brucethemoose2 on 4/28/23, 3:34 AM
There is also CodyCapybara (7B finetuned on code competitions), the "uncensored" Vicuna, OpenAssistant 13B (which is said to be very good), various non English tunes, medalpaca... the release pace maddening.
by brianjking on 4/30/23, 1:46 AM
I'll never understand why everyone is spending so much time on a model you cannot use commercially (at all).
Secondly, most of us can't even use the model for research or personal use, given the license.
by FloatArtifact on 4/29/23, 8:52 PM
There needs to be a slight dedicated to tracking all these models with regular updates.
by foobarbecue on 4/30/23, 12:44 AM
Ok I gotta know... what's the art?