from Hacker News

OpenLLaMA: An Open Reproduction of LLaMA

by sadiq on 5/3/23, 6:43 AM with 180 comments

  • by diimdeep on 5/3/23, 8:04 PM

    To use with llama.cpp on CPU and 8GB RAM

      git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build && cmake --build build
      python3 -m pip install -r requirements.txt
    
      cd models && git clone https://huggingface.co/openlm-research/open_llama_7b_preview_200bt/ && cd -
      python3 convert-pth-to-ggml.py models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights 1
      ./build/bin/quantize models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights/ggml-model-f16.bin models/open_llama_7b_preview_200bt_q5_0.ggml q5_0
      ./build/bin/main -m models/open_llama_7b_preview_200bt_q5_0.ggml --ignore-eos -n 1280 -p "Building a website can be done in 10 simple steps:" --mlock
  • by logicchains on 5/3/23, 8:11 AM

    It's not clear from the GitHub; are there any plans to eventually train the 30 or 65 billion weight LLaMA models? The 65B model seems comparable to GPT3.5 for many things, and can run fine on a beefy desktop just on CPU (CPU ram is much cheaper than GPU ram). It'd be amazing to have an open source version.
  • by jjice on 5/3/23, 2:47 PM

    Does anyone have any resources they recommend for just understanding the base terminology of models like this? I always see the terms "weights", "tokens", "model", etc. I feel like I understand what these mean, but I have no idea what I need to care about them for in open models like this? If I were to download an open model to run on my machine, would I download the weights? I'm just ignorant in the ML space I guess but not sure where to start.
  • by superpope99 on 5/3/23, 12:27 PM

    I'm always curious about the cost of these training runs. Some back of the envelope calculations:

    > Overall we reach a throughput of over 1900 tokens / second / TPU-v4 chip in our training run

    1 trillion / 1900 = 526315789 chip seconds ~= 150000 chip hours.

    Assuming "on-demand" pricing [1] that's about $500,000 training cost.

    [1] https://cloud.google.com/tpu/pricing

  • by quickthrower2 on 5/3/23, 10:43 AM

    I am quite new to this, I would like to get it running. Would the process roughly be:

    1. Get a machine with decent GPU, probably rent cloud GPU.

    2. On that machine download the weights/model/vocab files from https://huggingface.co/openlm-research/open_llama_7b_preview...

    3. Install Anaconda. Clone https://github.com/young-geng/EasyLM/.

    4. Install EasyLM:

        conda env create -f scripts/gpu_environment.yml
        conda activate EasyLM
    
    5. Run this command, as per https://github.com/young-geng/EasyLM/blob/main/docs/llama.md:

        python -m EasyLM.models.llama.llama_serve \
             --mesh_dim='1,1,-1' \
             --load_llama_config='13B' \
             --load_checkpoint='params::path/to/easylm/llama/checkpoint' \
    
    Am I even close?
  • by newswasboring on 5/3/23, 9:23 AM

    How is this model performing better than LLaMa in a lot of tasks[1] even though its trained on a fifth of the data (1 trillion vs 200 billion).

    [1]https://github.com/openlm-research/open_llama#evaluation

  • by logicchains on 5/3/23, 9:46 AM

    Would be very interesting to see https://github.com/BlinkDL/RWKV-LM trained on the same data
  • by Taek on 5/3/23, 12:08 PM

    How is this different from what RedPajamas is doing?

    Also, most people don't mind running LLaMA 7B at home so much because of enforceability, but a lot of commercial businesses would love to run a 65b parameter model if possible and can't because the license is more meaningfully prohibitive in a business context. Open versions of the larger models are a lot more meaningful to society at this point.

  • by bluecoconut on 5/3/23, 8:48 AM

    Really exciting how fast fully pre-trained new models are appearing.

    Here's another repo (with the same "open-llama" name) that has been available on hugging face as well for a few weeks. (different training dataset)

    https://github.com/s-JoL/Open-Llama https://huggingface.co/s-JoL/Open-Llama-V1

  • by LudwigNagasena on 5/3/23, 9:06 AM

    Is anyone familiar with the BOINC-style grid computing scene for ML and, specifically, LLM? Is there something interesting going on, or is it infeasible? Will things like OpenLLaMA help it?
  • by Eduard on 5/3/23, 10:59 AM

    Can someone explain how to tell if a model doesn't require a GPU and can run on a CPU?

    After setting up dalai, OpenAssistant, gpt4all and a bunch of other (albeit nonworking) LLM thingies, my current hunch is:

    if the model somewhere has "GGML" in its name, it doesn't require a GPU.

  • by martythemaniak on 5/3/23, 6:31 PM

    Has anyone successfully used embeddings with anything other than OpenAI's APIs? I've seen lots of debates on using embeddings vs fine-tuning for things like chatbots on private data, but is there a reason why you can't use both? IE, fine-tune LLaMA on your data, then run the same embeddings approach on top of your own fine-tuned model?
  • by ianpurton on 5/3/23, 10:32 AM

    > We are currently focused on completing the training process on the entire RedPajama dataset.

    So that's 1.2 trillion tokens. Nice.

  • by jasonm23 on 5/3/23, 9:22 AM

    Forgive me for the ignorance, but can a refined training model be a specific codebase, after say training on all standard docs for the language, and 3rd party libs, and so on.

    I have no formal idea how this is done, but my assumption is that "something like that" should work.

    Please disabuse me of any silly ideas.

  • by quickthrower2 on 5/3/23, 9:18 AM

    So is this free as in “do what you f’ing like with it”?
  • by venelin_valkov on 5/6/23, 4:22 PM

    I made a YouTube video on how to run OpenLLaMa on Google Colab with Hugging Face Transformers (using a T4 GPU): https://www.youtube.com/watch?v=1NOPciKuQb8

    Hope that helps!

  • by version_five on 5/4/23, 6:51 PM

    Has anyone actually used this? I poked around and it's so poorly documented that I don't see how one can readily, short of trying to go through the code, understand how to do a minimal run.
  • by scotty79 on 5/3/23, 9:13 AM

    Motivation?