by lucaspauker on 7/27/23, 5:08 PM with 90 comments
by jph00 on 7/27/23, 10:27 PM
The article refers to the BERT and GPT papers as the source of the fine-tuning idea. However, we actually first demonstrated it for universal models in 2017 and published the ULMFiT (Howard and Ruder) paper in early 2018. Prior to that, Dai and Le demonstrated the technique for in-corpus datasets. So it would be more accurate to say the approach can be traced back to those two papers, rather than to BERT and GPT.
BERT and GPT showed the effectiveness of scaling up the amount of data and compute, and switching the model architecture to Transformers (amongst other things).
by LASR on 7/27/23, 11:28 PM
We have some 100k context models too that can ingest entire documents.
So right now, I would say fine-tuning is probably only useful for a very narrow set of use cases.
by Animats on 7/27/23, 11:58 PM
Can anyone offer an example of a free public-facing LLM which has been fine-tuned by adding much specific info about some narrow area? Say, one that knows all the PR about some car brand or fandom? Somebody must have tried that by now.
by nullc on 7/27/23, 10:31 PM
uhhh. I understand what was intended there but while fine tuning may reduce the rate of hallucinations and make hallucinations more plausible, it's not magic accurate and trust-worthy dust.
Unfortunately many people think this stuff is magic and care should be taken to not encourage people to confuse improvements with resolving the issue.
One way of characterizing the LLM accuracy problem is that it often looks very accurate and convincing even when it is emitting nonsense. If you cast the problem in those terms-- as a problem of looking more trustworthy than it actually is-- fine tuning actually exacerbates the problem.
by treprinum on 7/27/23, 10:24 PM
by mickeyfrac on 7/27/23, 8:50 PM
by SpaceManNabs on 7/27/23, 10:02 PM
You should try a post on parameter efficient tuning next!
by bugglebeetle on 7/27/23, 10:34 PM
by coffee_am on 7/28/23, 6:43 AM
by zmmmmm on 7/28/23, 3:42 AM
The narrative goes, "look how awesome ChatGPT is, imagine how good it would be trained on just your company's documents".
Which 1000% misses the point. ChatGPT is because (a) it is trained on almost nothing short of the entire corpus of human language ever created. At > 1 trillion parameters, it can have ~1000 parameters for every human on the planet. Let that sink in. And then (b) because it has been subjected to an unknown but likely massive amount of human reinforcement feedback.
The idea that you can meaningfully impact the output of the model towards factual accuracy or logical correctness just by doing a small amount of fully automated training using a tiny corpus of company documents is seductive, but super far from robustly demonstrated as far as I'm aware. Yet this is the pitch being sold very often.
by phas0ruk on 7/27/23, 10:43 PM
by autokad on 7/27/23, 11:47 PM
by marcopicentini on 7/27/23, 11:19 PM
by ramesh31 on 7/27/23, 11:26 PM