by sadiq on 5/3/23, 6:43 AM with 180 comments
by diimdeep on 5/3/23, 8:04 PM
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build && cmake --build build
python3 -m pip install -r requirements.txt
cd models && git clone https://huggingface.co/openlm-research/open_llama_7b_preview_200bt/ && cd -
python3 convert-pth-to-ggml.py models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights 1
./build/bin/quantize models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights/ggml-model-f16.bin models/open_llama_7b_preview_200bt_q5_0.ggml q5_0
./build/bin/main -m models/open_llama_7b_preview_200bt_q5_0.ggml --ignore-eos -n 1280 -p "Building a website can be done in 10 simple steps:" --mlock
by logicchains on 5/3/23, 8:11 AM
by jjice on 5/3/23, 2:47 PM
by superpope99 on 5/3/23, 12:27 PM
> Overall we reach a throughput of over 1900 tokens / second / TPU-v4 chip in our training run
1 trillion / 1900 = 526315789 chip seconds ~= 150000 chip hours.
Assuming "on-demand" pricing [1] that's about $500,000 training cost.
by quickthrower2 on 5/3/23, 10:43 AM
1. Get a machine with decent GPU, probably rent cloud GPU.
2. On that machine download the weights/model/vocab files from https://huggingface.co/openlm-research/open_llama_7b_preview...
3. Install Anaconda. Clone https://github.com/young-geng/EasyLM/.
4. Install EasyLM:
conda env create -f scripts/gpu_environment.yml
conda activate EasyLM
5. Run this command, as per https://github.com/young-geng/EasyLM/blob/main/docs/llama.md: python -m EasyLM.models.llama.llama_serve \
--mesh_dim='1,1,-1' \
--load_llama_config='13B' \
--load_checkpoint='params::path/to/easylm/llama/checkpoint' \
Am I even close?by newswasboring on 5/3/23, 9:23 AM
by logicchains on 5/3/23, 9:46 AM
by Taek on 5/3/23, 12:08 PM
Also, most people don't mind running LLaMA 7B at home so much because of enforceability, but a lot of commercial businesses would love to run a 65b parameter model if possible and can't because the license is more meaningfully prohibitive in a business context. Open versions of the larger models are a lot more meaningful to society at this point.
by bluecoconut on 5/3/23, 8:48 AM
Here's another repo (with the same "open-llama" name) that has been available on hugging face as well for a few weeks. (different training dataset)
https://github.com/s-JoL/Open-Llama https://huggingface.co/s-JoL/Open-Llama-V1
by LudwigNagasena on 5/3/23, 9:06 AM
by Eduard on 5/3/23, 10:59 AM
After setting up dalai, OpenAssistant, gpt4all and a bunch of other (albeit nonworking) LLM thingies, my current hunch is:
if the model somewhere has "GGML" in its name, it doesn't require a GPU.
by martythemaniak on 5/3/23, 6:31 PM
by ianpurton on 5/3/23, 10:32 AM
So that's 1.2 trillion tokens. Nice.
by jasonm23 on 5/3/23, 9:22 AM
I have no formal idea how this is done, but my assumption is that "something like that" should work.
Please disabuse me of any silly ideas.
by quickthrower2 on 5/3/23, 9:18 AM
by venelin_valkov on 5/6/23, 4:22 PM
Hope that helps!
by version_five on 5/4/23, 6:51 PM
by scotty79 on 5/3/23, 9:13 AM