by liuxiaopai on 11/13/23, 7:45 PM with 27 comments
by Animats on 11/14/23, 12:50 AM
It can do humans in passive poses, but ask for an action shot and it botches it badly. It needs more training data on how bodies move. Maybe load it up with stills from dance, martial arts, and sports.
by GaggiX on 11/13/23, 11:43 PM
It also has the same idea as Dalle 3 to train the model on synthetic captions.
by ShamelessC on 11/14/23, 1:03 AM
by krasin on 11/14/23, 12:00 AM
by gigel82 on 11/14/23, 12:09 AM
>This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. GPU VRAM consumption under 10 GB will soon be supported, too. Stay tuned.
by ilaksh on 11/13/23, 9:36 PM
by camdenlock on 11/14/23, 12:14 AM
by andromeduck on 11/14/23, 4:42 AM
by philmitchell47 on 11/14/23, 11:48 AM
- Existing models for data pseudo-labelling
- ImageNet pretraining
- A frozen text encoder
- A frozen image encoder