from Hacker News

OpenLLaMA: An Open Reproduction of LLaMA

by sadiq on 5/3/23, 6:43 AM with 180 comments

by diimdeep on 5/3/23, 8:04 PM

To use with llama.cpp on CPU and 8GB RAM

  git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build && cmake --build build
  python3 -m pip install -r requirements.txt

  cd models && git clone https://huggingface.co/openlm-research/open_llama_7b_preview_200bt/ && cd -
  python3 convert-pth-to-ggml.py models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights 1
  ./build/bin/quantize models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights/ggml-model-f16.bin models/open_llama_7b_preview_200bt_q5_0.ggml q5_0
  ./build/bin/main -m models/open_llama_7b_preview_200bt_q5_0.ggml --ignore-eos -n 1280 -p "Building a website can be done in 10 simple steps:" --mlock

by logicchains on 5/3/23, 8:11 AM
It's not clear from the GitHub; are there any plans to eventually train the 30 or 65 billion weight LLaMA models? The 65B model seems comparable to GPT3.5 for many things, and can run fine on a beefy desktop just on CPU (CPU ram is much cheaper than GPU ram). It'd be amazing to have an open source version.
by jjice on 5/3/23, 2:47 PM
Does anyone have any resources they recommend for just understanding the base terminology of models like this? I always see the terms "weights", "tokens", "model", etc. I feel like I understand what these mean, but I have no idea what I need to care about them for in open models like this? If I were to download an open model to run on my machine, would I download the weights? I'm just ignorant in the ML space I guess but not sure where to start.
by superpope99 on 5/3/23, 12:27 PM
I'm always curious about the cost of these training runs. Some back of the envelope calculations:
> Overall we reach a throughput of over 1900 tokens / second / TPU-v4 chip in our training run
1 trillion / 1900 = 526315789 chip seconds ~= 150000 chip hours.
Assuming "on-demand" pricing [1] that's about $500,000 training cost.
[1] https://cloud.google.com/tpu/pricing
by quickthrower2 on 5/3/23, 10:43 AM
I am quite new to this, I would like to get it running. Would the process roughly be:
1. Get a machine with decent GPU, probably rent cloud GPU.
2. On that machine download the weights/model/vocab files from https://huggingface.co/openlm-research/open_llama_7b_preview...
3. Install Anaconda. Clone https://github.com/young-geng/EasyLM/.
4. Install EasyLM:
```
    conda env create -f scripts/gpu_environment.yml
    conda activate EasyLM
```
5. Run this command, as per https://github.com/young-geng/EasyLM/blob/main/docs/llama.md:
```
    python -m EasyLM.models.llama.llama_serve \
         --mesh_dim='1,1,-1' \
         --load_llama_config='13B' \
         --load_checkpoint='params::path/to/easylm/llama/checkpoint' \
```
Am I even close?
by newswasboring on 5/3/23, 9:23 AM
How is this model performing better than LLaMa in a lot of tasks[1] even though its trained on a fifth of the data (1 trillion vs 200 billion).
[1]https://github.com/openlm-research/open_llama#evaluation
by logicchains on 5/3/23, 9:46 AM
Would be very interesting to see https://github.com/BlinkDL/RWKV-LM trained on the same data
by Taek on 5/3/23, 12:08 PM
How is this different from what RedPajamas is doing?
Also, most people don't mind running LLaMA 7B at home so much because of enforceability, but a lot of commercial businesses would love to run a 65b parameter model if possible and can't because the license is more meaningfully prohibitive in a business context. Open versions of the larger models are a lot more meaningful to society at this point.
by bluecoconut on 5/3/23, 8:48 AM
Really exciting how fast fully pre-trained new models are appearing.
Here's another repo (with the same "open-llama" name) that has been available on hugging face as well for a few weeks. (different training dataset)
https://github.com/s-JoL/Open-Llama https://huggingface.co/s-JoL/Open-Llama-V1
by LudwigNagasena on 5/3/23, 9:06 AM
Is anyone familiar with the BOINC-style grid computing scene for ML and, specifically, LLM? Is there something interesting going on, or is it infeasible? Will things like OpenLLaMA help it?
by Eduard on 5/3/23, 10:59 AM
Can someone explain how to tell if a model doesn't require a GPU and can run on a CPU?
After setting up dalai, OpenAssistant, gpt4all and a bunch of other (albeit nonworking) LLM thingies, my current hunch is:
if the model somewhere has "GGML" in its name, it doesn't require a GPU.
by martythemaniak on 5/3/23, 6:31 PM
Has anyone successfully used embeddings with anything other than OpenAI's APIs? I've seen lots of debates on using embeddings vs fine-tuning for things like chatbots on private data, but is there a reason why you can't use both? IE, fine-tune LLaMA on your data, then run the same embeddings approach on top of your own fine-tuned model?
by ianpurton on 5/3/23, 10:32 AM
> We are currently focused on completing the training process on the entire RedPajama dataset.
So that's 1.2 trillion tokens. Nice.
by jasonm23 on 5/3/23, 9:22 AM
Forgive me for the ignorance, but can a refined training model be a specific codebase, after say training on all standard docs for the language, and 3rd party libs, and so on.
I have no formal idea how this is done, but my assumption is that "something like that" should work.
Please disabuse me of any silly ideas.
by quickthrower2 on 5/3/23, 9:18 AM
So is this free as in “do what you f’ing like with it”?
by venelin_valkov on 5/6/23, 4:22 PM
I made a YouTube video on how to run OpenLLaMa on Google Colab with Hugging Face Transformers (using a T4 GPU): https://www.youtube.com/watch?v=1NOPciKuQb8
Hope that helps!
by version_five on 5/4/23, 6:51 PM
Has anyone actually used this? I poked around and it's so poorly documented that I don't see how one can readily, short of trying to go through the code, understand how to do a minimal run.
by scotty79 on 5/3/23, 9:13 AM
Motivation?