from Hacker News

How we got Stable Diffusion XL inference to under 2 seconds

by varunshenoy on 8/31/23, 8:20 PM with 5 comments

by cwillu on 9/1/23, 7:32 AM
Playing around with cfg technique, I'm finding that turning off guidance at the 40% mark causes requested fine details to not appear in the final image. This sorta implies that switching cfg midway and/or switching prompt vectors might be interesting from a prompting standpoint, but it kinda kills it as a performance optimization.
by Tenoke on 9/1/23, 10:17 AM
It's a bit weird to talk about steps but not about the sampler (20 steps with Euler vs 20 steps in DPM+2M Karras are pretty different beasts both in terms of speed and quality).
I also see compiling but no AITemplate, which seems to be the among the hottest way to speed-up SD recently.
by yieldcrv on 9/1/23, 4:02 PM
This could save alot of money on Replicate.ai
Especially if you are charging your users the same 1,000% markup while your own costs have been cut into 1/3rd and deliver results faster
by gmerc on 9/1/23, 7:40 AM
I don’t know man, out of the box on SD-Next it’s about 3-4 secs for a picture at 1024 with UniPC and 20 steps on a 4090