from Hacker News

Implementation of Google's Griffin Architecture – RNN LLM

by milliondreams on 4/10/24, 5:47 PM with 38 comments

by VHRanger on 4/10/24, 7:13 PM
Like RWKV and Mamba, this is mixing some RNN properties to avoid the issues transformers have.
However I'm curious about their scaling claims. They have a plot that shows how the model scales in training with the FLOPs you throw at it.
But the issue we should rather be concerned with is the wall time of training for a set amount of hardware.
Back in 2018, we could train medium sized RNNs, the issue was with wall time of training and training stability.
by riku_iki on 4/10/24, 7:15 PM
I didn't get one detail: they selected 6B transformer as baseline and compared it to 7B Griffin
Why wouldn't select equal size models?..
by janwas on 4/11/24, 5:54 AM
For anyone interested in a C++ implementation, our github.com/google/gemma.cpp now supports this model.
by spxneo on 4/10/24, 8:54 PM
im not smart enough to know the significance of this...is Griffin like MAMBA?