by tantony on 3/19/23, 2:17 AM with 0 comments
The bindings currently use code from a pending PR of mine to make the original code into more of a library. Hopefully it will get merged into the main repository soon. I have also added a few CLI entry points that get installed along with the python package:
* llamacpp-convert - convert pytorch models into GGML format. This is an alias for the existing Python script in llama.cpp and requires PyTorch
* llamacpp-quantize - Perform INT4 quantization on the GGML mode. This is a wrapper for the "quantize" C++ program from the original repository and has no dependencies.
* llamacpp-cli - This is a Python version of the "main.cpp" program from the original repository that utilizes the bindings.
* llamacpp-chat - A wrapper over llamacpp-cli that includes a prompt that makes it behave like a chatbot. This is not very good as of right now.
You should theoretically be able to do "pip install llamacpp" and get going on most linux/macOS platforms by just running `llamacpp-cli`. I do not have Windows builds on the CI yet and you may have to build it yourself.
The package has no dependencies if you just want to run inference on the models.