from Hacker News

Llama2.java: Karpathy's llama2.c ported to Java

by mukel on 8/8/23, 10:51 AM with 18 comments

by gavinray on 8/8/23, 3:02 PM
The Java code is impressively written, using newer features like MemorySegment.
Looked at the author and realized it's Alfonso from the Graal team -- makes sense.
I wonder whether the "matmul" code could be further optimized with the Vector API and SIMD.
by atairov on 8/13/23, 10:08 PM
Thanks for sharing this! It's great to have a reference implementation written on java lang. With given original simplicity it's really easy to follow llama architecture logic.
Just in case if anyone interested in Python version, I spend some time on weekend and ported it to pure python -- https://github.com/tairov/llama2.py
I never knew that it would take about 500 lines of core part code to implement inference for such a cutting edge AI technology.
by mukel on 8/8/23, 10:51 AM
A Java port of llama2.c that performs very close to C on large models. Llama 2 7B runs at a whooping 1.6 tokens/s.
by shortrounddev2 on 8/8/23, 2:09 PM
How you all used these things for anything useful? I can't get them to give useful results on my 3060 8gb. If I wanted to get decent results I think I'd need to rent a GPU node somewhere, but chatGPT is still free
by jiehong on 8/8/23, 6:38 PM
This makes me wonder: what’s the status of GPU programming on the JVM?
Any abstraction for GPGPU or shaders programming?