by nihit-desai on 5/21/23, 10:08 PM with 59 comments
by tedivm on 5/21/23, 11:22 PM
One thing that always blows my mind is how much it is just not worth it to train LLMs in the cloud if you're a startup (and probably even less so for really large companies). Compared to 36 month reserved pricing the break even point was 8 months if you bought the hardware and rented out some racks at a colo, and that includes the on hands support. Having the dedicated hardware also meant that researchers were willing to experiment more when we weren't doing a planned training job, as it wouldn't pull from our budget. We spent a sizable chuck of our raise on that cluster but it was worth every penny.
I will say that I would not put customer facing inference on prem at this point- the resiliency of the cloud normally offsets the pricing, and most inference can be done with cheaper hardware than training. For training though you can get away with a weaker SLA, and the cloud is always there if you really need to burst beyond what you've purchased.
by fpgaminer on 5/22/23, 3:10 AM
LambdaLabs: For on-demand instances, they are the cheapest available option. Their offering is straightforward, and I've never had a problem. The downside is that their instance availability is spotty. It seems like things have gotten a little better in the last month, and 8x machines are available more often than not, but single A100s were rarely available for most of this year. Another downside is lack of persistent storage, meaning you have to transfer your data every time you start a new instance. They have some persistent storage in beta, but it's effectively useless since it's only in one region and there's no instances in that region that I've seen.
Jarvis: Didn't work for me when I tried them a couple months ago. The instances would never finish booting. It's also a pre-paid system, so you have to fill up your "balance" before renting machines. But their customer service was friendly and gave me a full refund so shrug.
GCP: This is my go-to so far. A100s are $1.1/hr interruptible, and of course you get all the other Google offerings like persistent disks, S3, managed SQL, container registry, etc. Availability of interruptible instances has been consistently quite good, if a bit confusing. I've had some machines up for a week solid without interruption, while other times I can tear down a stack of machines and immediately request a new one only to be told they are out of availability. The downsides are the usual GCP downsides: poor documentation, sometimes weird glitches, and perhaps the worst billing system I've seen outside of the healthcare industry.
Vast.ai: They can be a good chunk cheaper, but at the cost of privacy, security, support, and reliability. Pre-load only. For certain workloads and if you're highly cost sensitive this is a good option to consider.
RunPod: Terrible performance issues. Pre-load only. Non-responsive customer support. I ended up having to get my credit card company involved.
Self-hosted: As a sibling comment points out, self hosting is a great option to consider. In particular "Having the dedicated hardware also meant that researchers were willing to experiment more". I've got a couple cards in my lab that I use for experimentation, and then throw to the cloud for big runs.
by pavelstoev on 5/22/23, 1:43 AM
And links to PyPi:
=> https://pypi.org/project/deepview-profile/ => https://pypi.org/project/deepview-predict/
And you can actually do it in browser for several foundational models (more to come): => https://centml.ai/calculator/
Note: I have personal interests in this startup.
by stuckkeys on 5/21/23, 11:55 PM
by 1024core on 5/22/23, 4:10 AM
Given the heterogeneous nature of GPUs, RAM, tensor cores, etc. it would be nice to have a direct comparison of, say, number of teraflops-hour per dollar, or something like that.
by TradingPlaces on 5/22/23, 3:03 PM
If you look through the throughput/$ metric, the V100 16GB looks like a great deal, followed by H100 80GB PCIe 5. For most benchmarks, the A100 looks worse in comparison
by dkobran on 5/21/23, 11:28 PM
Disclosure: I work on Paperspace
by KeplerBoy on 5/21/23, 11:53 PM
I know those cards are second class citizens in the world of deep learning, but they have had (experimental) pytorch support for a while now, where are the offerings?
by nihit-desai on 5/21/23, 10:08 PM
by freediver on 5/22/23, 2:10 AM
by dotBen on 5/22/23, 4:38 AM
Wondering which vendors other HN'ers are using to achieve this?
by hcarlens on 5/22/23, 9:08 AM
Open to any feedback/suggestions! Will be adding 4090/H100 shortly.
by andrewstuart on 5/22/23, 12:25 AM
Consumers GPUs are much more available, much cheaper and much faster.
by rlupi on 5/25/23, 5:51 AM
by crucifiction on 5/22/23, 4:34 AM
by hislaziness on 5/22/23, 1:13 AM
by aeturnum on 5/22/23, 2:19 AM
by MacsHeadroom on 5/22/23, 5:20 AM
by IceHegel on 5/22/23, 2:54 AM