by pella on 1/18/25, 12:28 PM with 99 comments
by btown on 1/18/25, 3:12 PM
A card with a fraction of the FLOPS of cutting-edge graphics cards (and ideally proportionally less power consumption), but with 64-128GB VRAM-equivalent, would be a gamechanger for letting people experiment with large multi-modal models, and seriously incentivize researchers to build the next generation of tensor abstraction libraries for both CUDA and ROCm/HIP. And for gaming, you could break new grounds on high-resolution textures. AMD would be back in the game.
Of course, if it's not real VRAM, it needs to be at least somewhat close on the latency and bandwidth front, so let's pop on over and see what's happening in this article...
> An Infinity Cache hit has a load-to-use latency of over 140 ns. Even DRAM on the AMD Ryzen 9 7950X3D shows less latency. Missing Infinity Cache of course drives latency up even higher, to a staggering 227 ns. HBM stands for High Bandwidth Memory, not low latency memory, and it shows.
Welp. Guess my wish isn't coming true today.
by mk_stjames on 1/18/25, 4:06 PM
Why is it I can't buy a single one of these, on a motherboard, in a workstation format case, to use as an insane workstation? Assuming you could program for the accelerator part, there is an entire world of x86-fixed CAD, engineering, and entertainment industry (rendering, etc) where people want a single, desktop machine with 128GB + of fast ram to number crunch.
There are Blender artists out there that build dual and quad RTX4090 machines with Threadrippers for $20k+ in components all day, because their render jobs pay for it.
There are engineering companies that would not bat an eye at dropping $30k on a workstation if it mean they could spin around 80 gigabyte CATIA models of cars or aircraft loaded in RAM quicker. I know this at least because I sure as hell did with with several HP Z-series machines costing whole-Toyota-Corolla prices over the years...
But these combined APU chips are relegated to these server units. In the end is this a driver problem? Just a software problem? A chicken and egg problem where no one is developing the support because there isn't the hardware on the market, and there isn't the hardware on the market because AMD thinks there is no use case?
Edit: and note my use cases mentioned don't rely on latency, really, like videogamers need to hit framerates. The cache miss latency mentioned in the article doesn't matter as much for these type of compute applications where the main problems are just loading and unloading the massive amount of data. Things like offline renders and post-processing CFD simulations. Not necessarily a video output framerate.
by erulabs on 1/18/25, 10:48 PM
- Software is over
- An impenetrable software moat protects Nvidia's market capitalization
by neuroelectron on 1/18/25, 3:27 PM
How exactly are "applications" developed for this? Or is that all proprietary knowledge? TinyBox has resorted to writing their own drivers for 7900 XTX
by ChuckMcM on 1/18/25, 9:34 PM
by amelius on 1/18/25, 2:44 PM
by buyucu on 1/19/25, 12:30 PM
by behnamoh on 1/18/25, 6:21 PM