by mukel on 10/14/24, 8:12 AM
Features:
- Single file, no dependencies
- GGUF format parser
- Llama 3 tokenizer
- Support Llama 3, 3.1 (ad-hoc RoPE scaling) and 3.2 (tie word embeddings)
- Fast matrix-vector multiplication routines for Q4_0 and Q8_0 quantized tensors using Java's Vector API
- GraalVM's Native Image support
- AOT model preloading for instant time-to-first-token