from Hacker News

Native Sparse Attention: Hardware-Aligned and Natively Trainable

by teepo on 2/19/25, 1:15 PM with 0 comments