from Hacker News

Tiny GPU: A minimal GPU implementation in Verilog

by fgblanch on 4/25/24, 5:36 AM with 73 comments

by userbinator on 4/25/24, 1:43 PM
Because the GPU market is so competitive, low-level technical details for all modern architectures remain proprietary.
Except for Intel, which publishes lots of technical documentation on their GPUs: https://kiwitree.net/~lina/intel-gfx-docs/prm/
You can also find the i810/815 manuals elsewhere online, but except for an odd gap between that and the 965 (i.e. missing the 855/910/915/945) for some reason, they've been pretty consistent with the documentation.
by ginko on 4/25/24, 9:46 AM
Really cool project I love seeing HW projects like this in the open. But I'd argue that this is a SIMD coprocessor. For something to be a GPU it should at least have some sort of display output.
I know the terminology has gotten quite loose in recent years with Nvidia & Co. selling server-only variants of their graphics architectures as GPUs, but the "graphics" part of GPU designs make up a significant part of the complexity, to this day.
by jgarzik on 4/25/24, 11:49 AM
Nice! I warmly encourage open-core GPU work.
Here's another: https://github.com/jbush001/NyuziProcessor
by piotrrojek on 4/25/24, 10:34 AM
Really awesome project. I want to get into FPGAs, but honestly it's even hard to grasp where to start and the whole field feels very intimidating. My eventual goal would be to create acceleration card for LLMs (completely arbitrary), so a lot of same bits and pieces as in this project, probably except for memory offloading part to load bigger models.
by vineyardlabs on 4/25/24, 3:56 PM
Is there a reason they're mixing non-blocking and blocking assignment operators in sequential always blocks here?
by novaRom on 4/25/24, 12:43 PM
I did something similar many years ago in VHDL. There was a site called opencores for different open source HDL projects. I wonder if is there any good HPC level large scale distributed HDL simulator exists today? It makes sense to utilize modern GPUs for making RTL level simulations.
by mk_stjames on 4/25/24, 1:32 PM
Uh, the ALU implements a DIV instruction straight up at the hardware level? Is this normal to have as a real instruction in something like a modern CUDA core or is DIV usually a software emulation instead? Because actual hardware divide circuits take up a ton a space and I wouldn't have expected them in a GPU ALU.
It's so easy to write "DIV: begin alu_out_reg <= rs / rt; end" in your verilog but that one line takes a lotta silicon. But the person simulating this might not never see that if all they do is simulate the verilog.
by Narishma on 4/25/24, 10:15 AM
Yet another "GPU" providing no graphics functionality. IMO theses should be called something else.
by Jasper_ on 4/25/24, 12:44 PM
> Since threads are processed in parallel, tiny-gpu assumes that all threads "converge" to the same program counter after each instruction - which is a naive assumption for the sake of simplicity.
> In real GPUs, individual threads can branch to different PCs, causing branch divergence where a group of threads threads initially being processed together has to split out into separate execution.
Whoops. Maybe this person should try programming for a GPU before attempting to build one out of silicon.
Not to mention the whole SIMD that... isn't.
(This is the same person who stapled together other people's circuits to blink an LED and claimed to have built a CPU)