by hazard on 1/1/24, 10:13 PM with 10 comments
What are the best resources for learning things like GPU architecture, CUDA, Triton, etc?
My goal is to do be able to do something like take a description of Flash Attention and implement it from scratch, or optimize existing CUDA code.
by boberoni on 1/4/24, 1:24 AM
If you like lecture videos, I would recommend Hajj's YouTube playlist of 2021 lectures [2]. He works through a subset of the textbook.
This will give you a good foundation of GPU hardware architecture and CUDA programming. The knowledge is somewhat transferable to other areas of high-performance computing.
[1] https://www.amazon.com/Programming-Massively-Parallel-Proces...
[2] https://www.youtube.com/playlist?list=PLRRuQYjFhpmubuwx-w8X9...
by Const-me on 1/4/24, 2:37 AM
It’s vendor agnostic, so HLSL instead of CUDA or Triton. Here’s the compute shaders implementing inference of Mistral-7B model: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...
by Kon-Peki on 1/1/24, 11:24 PM
The Nvidia dev blog has some easy to follow tutorials, but they don’t get very complicated.
Nvidia also has a learning platform which offers fairly decent courses at a cost. You get a certificate for finishing.
You’ll find some books out there with good reputations. Ultimately, this is an area that leans heavily toward paying money for good quality learning materials.
by the__alchemist on 1/4/24, 3:40 AM
Step 2: Figure out how to set up the FFI bindings if required for your project's language.
Step 3: Read this article to learn kernel syntax, block/thread/stride management etc: https://developer.nvidia.com/blog/even-easier-introduction-c...
Step 4: Ask ChatGPT to translate your code into modern C++, or perhaps even directly into Kernels
Don't bother with Vulkan compute and shaders etc. It works, but is high friction compared to CUDA.
by throwaway81523 on 1/4/24, 5:43 AM
I would say ML concepts and algorithms are way more complicated than GPU programming per se. The fast.ai lectures were pretty understandable when I watched some of them a few years ago, but attention wasn't yet invented, and it was pretty obvious that it would take a fair amount of trial and error to become skilful at writing simple recognizers.
by Baldbvrhunter on 1/1/24, 10:21 PM
I've written CUDA kernels and I knew nothing about it going in.