from Hacker News

Ask HN: Do modern AI engines still need to do full re-trainings?

by zepearl on 7/28/24, 10:35 PM with 3 comments

I learned about ~AI algorithms in the 90s: backprocessing & clustering networks, and a little bit of genetic algos.

I then focused & programmed & played for a while with the model of the "backpropagation" network, until the early 2000' => it was fun, but not usable in my context. I then stopped fiddling with it and became inactive in this context.

An important property of a backpropagation network was (as much as I know) that it had to be fully re-trained whenever inputs changed (values of existing ones changed or inputs/outputs were removed/added).

Question:

Is it still like that for the currently fancy algos (the ones developed by Google/Facebook/OpenAI/Xsomething/...) or are they now better, so that they can now adapt without having to be fully retrained using the full set of (new/up-to-date) training data?

Asking because I lost track of the progress in this area during the last 20 years and especially recently I understand nothing involving all new names (e.g. "llama", etc...).

Thanks :)

by Micoloth on 7/29/24, 7:57 AM
I think what you are referring to is the concept of “finetuning”. You use a pretrained network and add a (relatively) small set of new input-output pairs to steer it in a new direction.
It's widely used, you can look it up.
A more challenging idea is whether it is possible to reuse the pretrained weights when training a network with a different architecture (maybe a bigger transformer with more heads, or something).
AFAIK this is not common practice, if you change the architecture you have to retrain from scratch. But given the cost of these trainings, I wouldn't be surprised if OpenAI&co had developed some technique to do this, eg across GPT versions..
by vasili111 on 7/30/24, 2:55 AM
Large Language models are pre-trained by creators on the huge data.
In many cases you do not need to do anything with LLM and you can just use it.
If they were not trained on the data that contains information that you are interested then you can use technique called RAG (Retrieval-Augmented Generation).
You also can do fine-tuning which is kind of training but on small amount of data.