from Hacker News

Show HN: Chinchilla Scaling Laws Are Not Universal

by KhoomeiK on 5/28/24, 6:48 PM with 0 comments

Hey HN! Chinchilla (DeepMind 2022) tells us that when we scale up our language model training, we should scale the parameters and data equally.

Over the last several months I've been hacking on a research project to determine if the optimal compute allocation (scaling law) for training an LLM is sensitive to training data complexity. I found that as data complexity increases, you need even more data than Chinchilla suggests!

I released the preprint just yesterday: https://arxiv.org/abs/2405.16684