from Hacker News

  • Top
  • New

Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding

by zhye on 2/8/24, 2:02 PM with 0 comments