from Hacker News

Ask HN: Fine tuning a 65B LLM vs. fine tuning task specific million size models

by OthmaneHamzaoui on 8/22/23, 8:05 PM with 0 comments

Hi HN,

Ex-ML Engineer to feel free to dive deep in the answers

Everywhere I look today (medium, reddit, twitter) everyone is talking about fine-tuning LLMs. How the future is taking billion size models and fine-tuning then distilling them to specialised LLMs that perform specific tasks (i.e: sentiment analysis, Q&A, summarisation).

1/When people speak about fine-tuning are they actually fully re-training the LLM (i.e updating weights) or mostly using techniques like few-shots prompting and Retrieval Augmented Generation (RAG) ?

2/If you want to actually fully re-train the LLM (i.e retraining the weights), why not just use “small” (millions vs billion size) models like bert that are specifically fine-tuned for the end goal tasks (NER, classification) and fine-tune those instead?

3/If your fine-tuning LLMs (no matter what you definition of fine tuning is) mind sharing what for and how you’re doing it?

P.S: I asked a similar question on reddit [1] but reframing a bit and asking here in the hopes of getting answers that focus on the re-training aspect and also to get a diverse set of views :)

[1] https://www.reddit.com/r/MachineLearning/comments/15xfesk/d_why_fine_tune_a_65b_llm_instead_of_using/