by 7d7n on 5/30/24, 6:43 PM with 1 comments
by swyx on 5/30/24, 7:20 PM
a year ago we were talking to MosaicML (https://x.com/swyx/status/1660033177178734592) about their 65k+ model. now people yawn when we have yet another 1m token model. wild.
the TLDR in the pod seems to be Meta choosing to train Llama with a RoPE scaling theta factor that can be tweaked for finetuning. Once Gradient noticed that it was off to the races.