from Hacker News

Gemma 3 Inference: vLLM on GKE. Over 22k token/s

by m4r1k on 4/14/25, 7:19 AM with 0 comments