from Hacker News

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMs

by jbotz on 5/10/25, 7:49 AM with 11 comments

by killerstorm on 5/14/25, 7:56 AM
Note that this works within a single sequence of tokens. It might be consistent with "episodic memory" metaphor if we consider a particular transformer run as its experience.
But this might be very different from what people expect from "memory" - i.e. ability to learn vast amounts of information and retrieve it as necessary.
This is more like a refinement of transformer attention: instead of running attention over all tokens (which is very expensive as it's quadratic), it selects a subset of token spans and runs fine-grained attention only on those. So it essentially breaks transformer attention into two parts - coarse-grained (k-NN over token spans) and fine-grained (normal).
It might be a great thing for long-context situations. But it doesn't make sense when you want millions of different facts to be considered - making them into long context is rather inefficient.
by mountainriver on 5/10/25, 11:18 PM
TTT, cannon layers, and titans seem like a stronger approach IMO.
Information needs to be compressed into latent space or it becomes computationally intractable
by p_v_doom on 5/14/25, 6:40 AM
Interesting. Before there even was attention I was thinking that the episodic memory model offers something that could be very useful for neural nets, so its cool to see people testing that
by MacsHeadroom on 5/10/25, 3:40 PM
So, infinite context length by making it compute bound instead of memory bound. Curious how much longer this takes to run and when it makes sense to use vs RAG.