by atallahw on 6/25/25, 3:17 PM
This was fun to work on. LLMs for writing kernels still has a long way to go. Its honestly a little surprising how decent they are now. I guess I've been pretty consistently "surprised" by codegen for a while now (meaning the last two years)
by mohsaied on 6/25/25, 3:14 PM
This is the first step towards fully automated GPU performance optimization. The idea is to automatically generate GPU kernels, then automatically integrate them in vLLM/SGLang/PyTorch.
by essamwisam on 6/25/25, 3:19 PM
Quite cool. It's interesting that the LLM is able to optimize code based on the target hardware itself.