from Hacker News

Which GPU(s) to Get for Deep Learning

by snow_mac on 7/26/23, 2:41 AM with 128 comments

by roenxi on 7/26/23, 9:43 AM
Evaluating AMD GPU by their specs is not going to paint the full picture. Their drivers are a serious problem. I've managed to get ROCm mostly working on my system (ignoring all the notifications of what is officially supported, the jammy debs from the official repo seem to work on Debian testing). The range of supported setups is limited so it is quite easy to end up in a similar situation.
I expect system lockups when doing any sort of model inference. From the experiences of the last few years I assume it is driver bugs. Based on their rate of improvement they probably will get there in around 2025, but their past performance has been so bad I wouldn't recommend buying a card for machine learning until they've proven that they're taking the situation seriously.
Although in my opinion buy AMD anyway if you need a GPU on linux. Their open source drivers are a lot less hassle as long as you don't need BLAS.
by ItsBob on 7/26/23, 9:36 AM
Just as an FYI/additional data point, I bought a 3090 FE from Ebay a few months ago for £605 including delivery.
I've only just started using it for Llama running locally on my computer at home and I have to say... colour me impressed.
It generates the output slightly faster than reading speed so for me it works perfectly well.
The 24GB of VRAM should keep it relevant for a bit too and I can always buy another and NVLink them should the need arise.
by Tepix on 7/26/23, 4:22 PM
I used Tim's guide to build a dual RTX 3090 PC, paying 2300€ in total by getting used components. It can run inference of Llama-65B 4bit quantized at more than 10tok/s.
Specs: 2x RTX 3090, NVLink Bridge, 128GB DDR4 3200 RAM, Ryzen 7 3700X, X570 SLI mainboard, 2TB M.2 NVMe SSD, air cooled mesh case.
Finding the 3-slot nvlink bridge is hard and it's usually expensive. I think it's not worth it in most cases. I managed to find a cheap used one. Cooling is also a challenge. The cards are 2.7 slots wide and the spacing is usually 3 slots, so there isn't much room. Some people are putting 3d printed shrouds on the back of the PC case to suck the air out of the cards with an extra external fan. Also limiting the power from 350W to 280W or so per card doesn't cost a lot of performance. The CPU is not limiting the performance at all, as long as you have 4 cores per GPU you're good.
by andy_ppp on 7/26/23, 2:09 PM
I hear a lot about CUDA and how bad ROCm is etc. and I’ve been trying to understand what exactly CUDA is doing that is so special; isn’t the maths for neural networks mostly multiplying large arrays/tensors together? What magic is CUDA doing that is so different for other vendors to implement? Is it just lock-in, the type of operations that are available, some kind of magical performance advantage or something else that CUDA is doing?
by nl on 7/26/23, 10:18 AM
You can tell how NVIDIA dominants the market by the fact their price/performance "curve" is almost a straight line.
In a competitive market that line has distortions where one player trts to undercut the other.
There are no bargains because there is almost no competitive pressure and so there is barely any distortion in that line.
by politelemon on 7/26/23, 9:27 AM
So Nvidia is going to pretty much corner the market for a long time? This bit I expected but was still sad to read. Surely we would benefit from competition. It would probably take a lot of investment from AMD to make that happen, I imagine.
> AMD GPUs are great in terms of pure silicon: Great FP16 performance, great memory bandwidth. However, their lack of Tensor Cores or the equivalent makes their deep learning performance poor compared to NVIDIA GPUs. Packed low-precision math does not cut it. Without this hardware feature, AMD GPUs will never be competitive.
Edit: what about Intel arc GPU? Any hope there?
by fnands on 7/26/23, 3:46 PM
App based on this post to help you decide what to buy: https://nanx.me/gpu/
by frognumber on 7/26/23, 4:54 PM
I think there's one more axis: Frequency-of-use.
For occasionally use, the major constraint isn't speed so much as which models fit. I tend to look at $/GB VRAM as my major spec. Something like a 3060 12GB is an outlier for fitting sensible models while being cheap.
I don't mind waiting a minute instead of 15 seconds for some complex inference if I do it a few times per day. Or having training be slower if it comes up once every few months.
by PeterStuer on 7/26/23, 10:58 AM
I'm sticking with nVidia for now (currently a 3090 bought secondhand of eBay) as it is the most tested/supported by far, but it is great to see AMD making progress (finally) as some competition in this segment is desperatly needed.
by savandriy on 7/26/23, 11:28 AM
I've bought a Radeon RX 6700XT (12GB) last year, primarily for playing games.
But after Stable Diffusion came out, I started to play around with it and was pleasantly surprised that the GPU could handle it!
The setup is a little messy, and Linux only.
For someone targeting AI, definitely pick an Nvidia card with 12+ GBs of VRAM.
by reducesuffering on 7/26/23, 2:50 PM
You'll want lots of memory, so depends on your price point.
4090 ($1,600) > 3090 ($1300 new - $600 used) > 3060 ($300)
used 3090 is ideal value. Lots of models will need the 24gb ram
by pizza on 7/26/23, 2:54 PM
Trying to build a scalable home 4090 cluster but running into a lot of confusion...
Let's say
- I have a motherboard + cpu + other components and they've both got plenty of pcie lanes to spare, total this part draws 250W (incl the 25% extra wattage headroom)
- start off with one RTX 4090, TDP 450W, with headroom ~600W.
- I want to scale up by adding more 4090s over time, as many as my pcie lanes can support.
```
    1. How do I add more PSUs over time? 

    2. Recommended initial PSU wattage? Recommended wattage for each additional pair of 4090s?

    3. Recommended PSU brands and models for my use case?

    4. Is it better to use PCI gen5 spec-rated PSUs? ATX 3.0? 12vhpwr cables rather than the ordinary 8-pin cables? I've also read somewhere that power cables between different brands of PSUs are *not* interchangeable??

    5. Whenever I add an additional PSU, do I need to do something special to electrically isolate the PCIe slots?

    6. North American outlets are rated for ~15A * 120V. So roughly 1800W. I can just use one outlet per psu whenever it's under 1800W, right? For simplicity let's also ignore whatever load is on that particular electrical circuit.
```
Each GPU means another 600W. Let's say I want to add another PSU for every 2 4090s. I understand that to sync the bootup of multiple PSUs you need an add2psu adapter.
I understand the motherboard can provide ~75W for a pcie slot. I take it that the rest comes from the psu power cables. I've seen conflicting advice online - apparently miners use pcie x1 electrically isolated risers for additional power supplies, but also I've seen that it's fine as long as every input power cable for 1 gpu just comes from one psu, regardless of whether it's the one that powers the motherboard. Either way x1 risers is an unattractive option bc of bandwidth limitations.
pls help
by paul_funyun on 7/26/23, 10:27 PM
One, don't use a case. Look at how miners mounted their hardware on racks and take notes. Cheaper, better for temps, and the most efficient use of space.
Two, I recommend ignoring electricity cost and using all you can. If it's cheaper now than it ever will be, use it while it's cheap. If it will go down due to renewables, nuclear, etc in the future, it's good to buy up the GPUs while their price is artificially depressed from energy fears.
Third, go for server type PSUs and breakout boards. The server PSUs cant be beaten in watts for your dollar, and are extremely efficient.
Finally, consider scooping up some x79 and x99 xeon boards from Chinese sellers. They're cheap as hell, have PCI lanes out the wazoo, etc. This means you don't have to fool with as many mobos to run the same amount of gpus. If you go this route, don't get the bottom of the barrel no-name motherboards. Machinist is a decent one.
by andrewstuart on 7/26/23, 3:33 PM
There’s clearly demand to buy AI capable GPUs at the store at a low price.
But Nvidias monopoly mean a they cripple their retail cards and push the AI stuff to data centers.
If only there was many manufacturers of AI hardware and software there would be abundant cheap products at every level.
AMD and Intel don’t seem to be able to compete and there’s no sign that will change.
So AI is going to remain expensive and hard to get for a very long time.
by graton on 7/26/23, 2:46 PM
I almost immediately became suspicious on the accuracy of this article when they said the "Nvidia RTX 40 Ampere series". Ampere was the architecture name for the RTX 30 series. Ada Lovelace is the architecture name for the RTX 40 series.
by jcuenod on 7/26/23, 7:38 PM
Any advice for mobile gpus? I'm interested in getting a laptop (preferably in the portable category). Obviously it's not going to be in 4090 territory, that's a tradeoff I'm willing to make.
by adultSwim on 7/31/23, 3:38 PM
Weird to leave out Apple. They seem to be the cheapest option to get a large amount of GPU memory.
by synergy20 on 7/26/23, 1:18 PM
4090 is now in high end PCs, with 24GB VRAM, that's what I'm going to buy.
Everyone talks about Nvidia GPUs and AMD MI250/MI300, where is Intel? Would love to have a 3rd player.
by justinclift on 7/27/23, 11:19 AM
Raw performance rating for the RTX 3070 seems very weirdly placed in the chart. It's below the RTX 3060 Ti, which doesn't seem to make any sense.
by lyapunova on 7/26/23, 6:38 PM
I never tire of this. Tim is a wonderful no nonsense person. I love these posts and I love that it stays up to date.
by arvinsim on 7/26/23, 8:53 AM
Really a shame that the 4070ti doesn't have 16GB.
But I guessed it is expected that Nvidia doesn't want to cannibalize the 4080.
by kristianp on 7/26/23, 4:18 PM
For a compromise, how is the recently released 4060ti with 16gb RAM? Its about a third the price of a 4090.
by xnx on 7/26/23, 1:09 PM
Do local GPUs make sense? For the same price, can't you got a full years worth of cloud gpu time?
by 32gbsd on 7/26/23, 3:28 AM
Omg that's a long read but very informative