from Hacker News

Cookbook: Finetuning Llama 2 in your own cloud environment, privately

by covi on 8/2/23, 6:50 PM with 13 comments

by andrewmutz on 8/2/23, 7:37 PM
Does anyone know how to estimate the cost of inference using your own Llama2 model? This article talks about the cost of fine tuning it, but not what to expect when running it in production for inference.
In particular, it would be great to know how the inference cost compares to gpt3.5 turbo and gpt4
by zhwu on 8/2/23, 7:03 PM
It is the underlying operational guide of the latest release of Vicuna-1.5: https://twitter.com/lmsysorg/status/1686794639469371393
by ripvanwinkle on 8/2/23, 8:18 PM
Can fine tuning replace the retrieval step i.e. is it possible to fine tune the model so it knows all the knowledge from my organization and we skip the retrieval step during a chat about the data
by dang on 8/2/23, 9:35 PM
Related ongoing thread:
Run Llama 2 uncensored locally - https://news.ycombinator.com/item?id=36973584 - Aug 2023 (148 comments)
by bestcoder69 on 8/2/23, 9:35 PM
Looking for a llama2 fine-tune guide specific to Apple silicon, if anyone has one. I wanna see how big of a model I can tune on my 64gb mac studio