from Hacker News

GPU Puzzles

by cgadski on 9/18/24, 1:08 PM with 40 comments

  • by srush on 9/23/24, 12:20 PM

    I made these a couple of years ago as a teaching exercise for https://minitorch.github.io/. At the time the resources for doing anything on GPUs were pretty sparse and the NVidia docs were quite challenging.

    These days there are great resources for going deep on this topic. The CUDA-mode org is particularly great, both their video series and PMPP reading groups.

  • by aleinin on 9/23/24, 3:36 PM

    I recently ported this to Metal for Apple Silicon computers. If you're interested in learning GPU programming on an M series Mac, I think this is a very accessible option. Thanks to Sasha for making this!

    https://github.com/abeleinin/Metal-Puzzles

  • by fifilura on 9/23/24, 2:02 PM

    I think this course is also relevant for some deeper context.

    https://gfxcourses.stanford.edu/cs149/fall23/lecture/datapar...

  • by saagarjha on 9/23/24, 6:04 PM

    When working on GPU code there’s really two parts to it, I feel. One is “how do I even write code for the GPU” which this tutorial seems to cover but there’s a second part which is “how do I write good code for the GPU” which seems like it would need another resource or expansion to this one.
  • by ismailmaj on 9/23/24, 1:50 PM

    It would be nice if the puzzles natively supported C++ CUDA.
  • by czhu12 on 9/23/24, 6:41 PM

    I loved the tensor puzzles you made. I spent the morning revisiting and liking all the videos on youtube you've made. Hope for many more in the future!
  • by throwaway314155 on 9/23/24, 3:09 PM

    Either puzzle 4 has a bug in it or I'm losing my mind. (Possible answer to solution below, so don't read if you want to go in fresh)

        # FILL ME IN (roughly 2 lines)
        if local_i < size and local_j < size:
            out[local_i][local_j] = a[local_i][local_j] + 10
    
    
    Results in a failed assertion:

         AssertionError: Wrong number of indices
    
    
    But the test cell beneath it will still pass?
  • by wmil on 9/23/24, 4:54 PM

    So I'm used to working with lists and maps, which doesn't really track well with tackling problems on thousands of cores.

    Is the usual strategy to worry less about repeating calculations and just use brute force to tackle the problem?

    Is there a good resource to read about how to tackle problems in an extremely parallel way?

  • by dejanig on 9/23/24, 4:52 PM

    Wow, It looks realy interesting, I will definitely look into it.
  • by az226 on 9/23/24, 7:59 PM

    Can I hire you to make Flash Attention a reality for V100?
  • by xandrius on 9/23/24, 11:36 AM

    Looks nice and fun but the "see-through" font for the titles in the screenshots gives me some deep and primordial unease, not sure why.
  • by 867-5309 on 9/23/24, 3:48 PM

    seems like an opportune moment to gift a plug for bitcoin puzzles, namely BTC32 / 1000 BTC Challenge[1]

    pools are in dire need of cuda developers

    [1]https://bitcointalk.org/index.php?topic=1306983.0