from Hacker News

Serving LLM 24x Faster on the Cloud with VLLM and SkyPilot

by zhwu on 6/29/23, 5:11 PM with 1 comments

  • by brucethemoose2 on 6/29/23, 5:27 PM

    Another vLLM post... Its cool, but I still can't tell if its SOTA? Vanilla transformers LLaMA is not optimal at all, especially in the presence of quantized backends like exLlama, GPTQ, Llama.cpp, TVM Llama, and (I think) JAX Llama and Torch-MLIR Llama.