by srush on 9/23/24, 12:20 PM
I made these a couple of years ago as a teaching exercise for
https://minitorch.github.io/. At the time the resources for doing anything on GPUs were pretty sparse and the NVidia docs were quite challenging.
These days there are great resources for going deep on this topic. The CUDA-mode org is particularly great, both their video series and PMPP reading groups.
by aleinin on 9/23/24, 3:36 PM
I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!
https://github.com/abeleinin/Metal-Puzzles
by fifilura on 9/23/24, 2:02 PM
by saagarjha on 9/23/24, 6:04 PM
When working on GPU code there’s really two parts to it, I feel. One is “how do I even write code for the GPU” which this tutorial seems to cover but there’s a second part which is “how do I write good code for the GPU” which seems like it would need another resource or expansion to this one.
by ismailmaj on 9/23/24, 1:50 PM
It would be nice if the puzzles natively supported C++ CUDA.
by czhu12 on 9/23/24, 6:41 PM
I loved the tensor puzzles you made. I spent the morning revisiting and liking all the videos on youtube you've made. Hope for many more in the future!
by throwaway314155 on 9/23/24, 3:09 PM
Either puzzle 4 has a bug in it or I'm losing my mind. (Possible answer to solution below, so don't read if you want to go in fresh)
# FILL ME IN (roughly 2 lines)
if local_i < size and local_j < size:
out[local_i][local_j] = a[local_i][local_j] + 10
Results in a failed assertion:
AssertionError: Wrong number of indices
But the test cell beneath it will still pass?
by wmil on 9/23/24, 4:54 PM
So I'm used to working with lists and maps, which doesn't really track well with tackling problems on thousands of cores.
Is the usual strategy to worry less about repeating calculations and just use brute force to tackle the problem?
Is there a good resource to read about how to tackle problems in an extremely parallel way?
by dejanig on 9/23/24, 4:52 PM
Wow, It looks realy interesting, I will definitely look into it.
by az226 on 9/23/24, 7:59 PM
Can I hire you to make Flash Attention a reality for V100?
by xandrius on 9/23/24, 11:36 AM
Looks nice and fun but the "see-through" font for the titles in the screenshots gives me some deep and primordial unease, not sure why.
by 867-5309 on 9/23/24, 3:48 PM