from
Hacker News
Top
New
Mosaic trained a 1B parameter model on 440 GPUs for 200B tokens
by
ovaistariq
on 4/21/23, 2:39 PM with 0 comments