from Hacker News

Show HN: Python Bindings for llama.cpp with some CLIs

by tantony on 3/19/23, 2:17 AM with 0 comments

These are my Python bindings for @ggerganov's llama.cpp. My Python bindings build on this work and provide an easy-to-use interface for Python developers to take advantage of LlamaCPP's powerful inference capabilities.

The bindings currently use code from a pending PR of mine to make the original code into more of a library. Hopefully it will get merged into the main repository soon. I have also added a few CLI entry points that get installed along with the python package:

* llamacpp-convert - convert pytorch models into GGML format. This is an alias for the existing Python script in llama.cpp and requires PyTorch

* llamacpp-quantize - Perform INT4 quantization on the GGML mode. This is a wrapper for the "quantize" C++ program from the original repository and has no dependencies.

* llamacpp-cli - This is a Python version of the "main.cpp" program from the original repository that utilizes the bindings.

* llamacpp-chat - A wrapper over llamacpp-cli that includes a prompt that makes it behave like a chatbot. This is not very good as of right now.

You should theoretically be able to do "pip install llamacpp" and get going on most linux/macOS platforms by just running `llamacpp-cli`. I do not have Windows builds on the CI yet and you may have to build it yourself.

The package has no dependencies if you just want to run inference on the models.