from Hacker News

Ask HN: Best cloud GPU solution for inference API?

by topoftheforts on 4/2/23, 1:37 PM with 2 comments

I want to build an API that runs Stable Diffusion (txt2img, img2img, etc). I'm not an expert in ML but I've studied a lot recently and my understanding is: - I can't use the AUTOMATIC1111 api because it doesn't support multiple requests - Generally, one GPU will only serve one request, so I shouldn't use a cloud GPU server and build my own API - Serverless is the way to go then, but cold starts are a problem and it takes a long time to run a model

Also I want to note that I know there's some services that run SD out of the box as API, but I'm not interested in them as I need to make specific changes to my model.

My current best attempt is banana.dev, I have pushed my model as cog adapted to Docker, but it is quite slow and the service seems to under a lot of stress lately, sometimes taking 1+ min even to return an initial response (not with the generated image, just a response to say that it started the job)

Another option is replicate.com but it's quite expensive and slow too.

Are those the best current options or can I do something better, with more work on my side? I have found stochastic.ai and modals.com but would like some opinions before I start building adapter for these services.