from Hacker News

Llama2.java: Karpathy's llama2.c ported to Java

by mukel on 8/8/23, 10:51 AM with 18 comments

  • by gavinray on 8/8/23, 3:02 PM

    The Java code is impressively written, using newer features like MemorySegment.

    Looked at the author and realized it's Alfonso from the Graal team -- makes sense.

    I wonder whether the "matmul" code could be further optimized with the Vector API and SIMD.

  • by atairov on 8/13/23, 10:08 PM

    Thanks for sharing this! It's great to have a reference implementation written on java lang. With given original simplicity it's really easy to follow llama architecture logic.

    Just in case if anyone interested in Python version, I spend some time on weekend and ported it to pure python -- https://github.com/tairov/llama2.py

    I never knew that it would take about 500 lines of core part code to implement inference for such a cutting edge AI technology.

  • by mukel on 8/8/23, 10:51 AM

    A Java port of llama2.c that performs very close to C on large models. Llama 2 7B runs at a whooping 1.6 tokens/s.
  • by shortrounddev2 on 8/8/23, 2:09 PM

    How you all used these things for anything useful? I can't get them to give useful results on my 3060 8gb. If I wanted to get decent results I think I'd need to rent a GPU node somewhere, but chatGPT is still free
  • by jiehong on 8/8/23, 6:38 PM

    This makes me wonder: what’s the status of GPU programming on the JVM?

    Any abstraction for GPGPU or shaders programming?