by KhoomeiK on 5/28/24, 6:48 PM with 0 comments
Over the last several months I've been hacking on a research project to determine if the optimal compute allocation (scaling law) for training an LLM is sensitive to training data complexity. I found that as data complexity increases, you need even more data than Chinchilla suggests!
I released the preprint just yesterday: https://arxiv.org/abs/2405.16684