by mukel on 8/8/23, 10:51 AM with 18 comments
by gavinray on 8/8/23, 3:02 PM
Looked at the author and realized it's Alfonso from the Graal team -- makes sense.
I wonder whether the "matmul" code could be further optimized with the Vector API and SIMD.
by atairov on 8/13/23, 10:08 PM
Just in case if anyone interested in Python version, I spend some time on weekend and ported it to pure python -- https://github.com/tairov/llama2.py
I never knew that it would take about 500 lines of core part code to implement inference for such a cutting edge AI technology.
by mukel on 8/8/23, 10:51 AM
by shortrounddev2 on 8/8/23, 2:09 PM
by jiehong on 8/8/23, 6:38 PM
Any abstraction for GPGPU or shaders programming?