from Hacker News

Eagle-3 Speculative Decoding for LLM Inference (5.6x speedup)

by summarity on 4/6/25, 9:00 PM with 0 comments