by freakynit on 6/28/24, 5:36 AM with 4 comments
It was able to produce very good images based on training data. And is such a simple network.
My question is: why is all that extra complexity needed in today's text-to-image models based on transformers? Wouldn't scaling this out work equally well?
Code: https://gist.github.com/freakynit/1118403ad80448ee0313ba6c879f8688
Generated image: https://imgur.com/LCHDBhI
by bjourne on 6/28/24, 1:09 PM
by p1esk on 6/28/24, 6:03 PM
by Am4TIfIsER0ppos on 6/28/24, 9:40 AM