by milliondreams on 4/10/24, 5:47 PM with 38 comments
by VHRanger on 4/10/24, 7:13 PM
However I'm curious about their scaling claims. They have a plot that shows how the model scales in training with the FLOPs you throw at it.
But the issue we should rather be concerned with is the wall time of training for a set amount of hardware.
Back in 2018, we could train medium sized RNNs, the issue was with wall time of training and training stability.
by riku_iki on 4/10/24, 7:15 PM
Why wouldn't select equal size models?..
by janwas on 4/11/24, 5:54 AM
by spxneo on 4/10/24, 8:54 PM