by jbotz on 5/10/25, 7:49 AM with 11 comments
by killerstorm on 5/14/25, 7:56 AM
But this might be very different from what people expect from "memory" - i.e. ability to learn vast amounts of information and retrieve it as necessary.
This is more like a refinement of transformer attention: instead of running attention over all tokens (which is very expensive as it's quadratic), it selects a subset of token spans and runs fine-grained attention only on those. So it essentially breaks transformer attention into two parts - coarse-grained (k-NN over token spans) and fine-grained (normal).
It might be a great thing for long-context situations. But it doesn't make sense when you want millions of different facts to be considered - making them into long context is rather inefficient.
by mountainriver on 5/10/25, 11:18 PM
Information needs to be compressed into latent space or it becomes computationally intractable
by p_v_doom on 5/14/25, 6:40 AM
by MacsHeadroom on 5/10/25, 3:40 PM