by averylamp on 10/18/23, 4:46 PM with 57 comments
by tasdfqwer0897 on 10/18/23, 5:04 PM
Note that you can get the model weights on HuggingFace here: https://huggingface.co/adept/fuyu-8b
by fpgaminer on 10/18/23, 9:49 PM
Is there an associated paper? Or more specifically, details on the training dataset? It must have been a mix of text and VLM tasks, otherwise one or the other capability would have rotted during training. But I wonder if they trained off strictly VLM corpora, or also used plain image-text datasets like CLIP. It would be interesting if only the former.
Also makes me wonder if it could be trained on something like CommonCrawl where all the images are retained and interspersed correctly throughout the text. This model could theoretically train just fine off that, and it would unlock a whole new dataset effectively.
And has there been an inspection of what the model is outputting for predicted image "tokens"? Is it correctly predicting projected image patches to any degree of accuracy? And could therefore also generate images inline with text if another de-projection layer was trained?
by joanfihu on 10/21/23, 11:22 AM
https://joanfihu.wordpress.com/2023/10/19/evaluating-adepts-...
by abrichr on 10/19/23, 12:27 AM
For anyone interested in contributing to a fully open source alternative, join us at https://github.com/OpenAdaptAI/OpenAdapt
Lots of interesting work to be done, including integrating with Fuyu-8B!
by thatcherc on 10/18/23, 5:57 PM
by mark_l_watson on 10/19/23, 1:52 AM
I am also getting even more excited by the explosion of work on open models. I still haven’t adjusted to how good mistral-7B is, and it runs on my Mac without breaking a sweat.
by yeldarb on 10/18/23, 11:31 PM
by devinprater on 10/18/23, 9:46 PM
by stavros on 10/19/23, 12:04 AM
by paulkon on 10/18/23, 11:32 PM
by WanderPanda on 10/19/23, 1:03 AM
by thefcpk on 10/18/23, 10:37 PM
by StephenAshmore on 10/19/23, 1:48 AM
by og_kalu on 10/18/23, 7:38 PM
by lxe on 10/18/23, 10:24 PM
by ronsor on 10/18/23, 7:43 PM