by brucethemoose2 on 6/29/23, 5:27 PM
Another vLLM post... Its cool, but I still can't tell if its SOTA? Vanilla transformers LLaMA is not optimal at all, especially in the presence of quantized backends like exLlama, GPTQ, Llama.cpp, TVM Llama, and (I think) JAX Llama and Torch-MLIR Llama.