from Hacker News

Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs

by cpldcpu on 5/4/25, 11:35 PM with 53 comments

by cpldcpu on 5/5/25, 6:12 AM
Some more background information:
One of the original proposals for in-DRAM compute: https://users.ece.cmu.edu/~omutlu/pub/in-DRAM-bulk-AND-OR-ie...
First demonstration with off-the-shelf parts: https://parallel.princeton.edu/papers/micro19-gao.pdf
DRAM Bender, the tool they are using to implement this: https://github.com/CMU-SAFARI/DRAM-Bender
Memory-Centric Computing: Recent Advances in Processing-in-DRAMhttps://arxiv.org/abs/2412.19275
by userbinator on 5/5/25, 6:07 AM
Did anyone else notice the absolutely insane author lists of references 1 and 3?
I was expecting to find this 2016 article in there: https://news.ycombinator.com/item?id=12469270
This 2019 one does show up: https://news.ycombinator.com/item?id=22712811
Of course, this "out of spec" behaviour of DRAM, more specifically the ability to do copying, is also implicated in this infamous bug: https://news.ycombinator.com/item?id=5314959
It seems more than one person independently observed such a thing, and thought "this might be a useful behaviour".
by walterbell on 5/5/25, 5:29 AM
> By intentionally issuing DRAM commands that violate manufacturer-specified timing parameters.. [gaining] massive parallelism up to 65,536 bitwise operations in parallel.
Take that, binary blobs for DRAM training!
by robwwilliams on 5/5/25, 3:14 AM
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.
by Bolwin on 5/5/25, 2:58 AM
They're doing matrix operations in the Dram itself? That sounds insane and also fascinating
by chasd00 on 5/5/25, 4:30 PM
In the hardware world are there risks of taking advantage of a bug knowing that the manufacturer may someday fix the bug? I know in the software world it's a bad idea to leverage a bug in a platform to enable a feature (or fix another bug). The bug you're counting on being present may get fixed 15 years in the future and then your system explodes and no one knows why.
edit: seems like there was a recent discussion about something similar... undefined behavior in some C function iirc
by protocolture on 5/5/25, 9:59 AM
>General matrix-vector multiplication (GeMV)
Ok, so my math isnt great.
When I was studying Quaternions during my 3d math class (That I failed the first time, like I said, not a math guy) they briefly covered the history of matrix calculation in graphics development.
My understanding is that Quaternions became popular because they are almost as accurate as matrices but much less complex computationally.
Has anyone tried building an LLM using Quats instead of matrices?
Or are the optimisations with Quaternions more useful in realtime?
by morphle on 5/5/25, 6:27 AM
A bit unscientific that they don't cite the original Intelligent RAM (IRAM) sources from 1997:
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=iram...
by willvarfar on 5/5/25, 5:35 AM
Can we expect to see matrix multiplication and perhaps other ops move from classic CPUs out into the DRAM, perhaps with deliberate hardware support?
And does such a processing shift give advantage to Samsung etc? Where does this leave NVIDIA etc?
by lolc on 5/6/25, 10:13 AM
Funny hack. Without having read the paper I'd assume the operations to be thermally unstable. So LLM inference results will vary based on environmental temperature :-)
by xiphias2 on 5/5/25, 6:52 AM
This woule be a cool way to make a cheap inferencing device for the largest LLMs
by swimwiththebeat on 5/5/25, 6:07 AM
So is this a new technique of doing computations within existing DRAM to overcome the memory wall issue of modern computing?