from Hacker News

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMs

by jbotz on 5/10/25, 7:49 AM with 11 comments

  • by killerstorm on 5/14/25, 7:56 AM

    Note that this works within a single sequence of tokens. It might be consistent with "episodic memory" metaphor if we consider a particular transformer run as its experience.

    But this might be very different from what people expect from "memory" - i.e. ability to learn vast amounts of information and retrieve it as necessary.

    This is more like a refinement of transformer attention: instead of running attention over all tokens (which is very expensive as it's quadratic), it selects a subset of token spans and runs fine-grained attention only on those. So it essentially breaks transformer attention into two parts - coarse-grained (k-NN over token spans) and fine-grained (normal).

    It might be a great thing for long-context situations. But it doesn't make sense when you want millions of different facts to be considered - making them into long context is rather inefficient.

  • by mountainriver on 5/10/25, 11:18 PM

    TTT, cannon layers, and titans seem like a stronger approach IMO.

    Information needs to be compressed into latent space or it becomes computationally intractable

  • by p_v_doom on 5/14/25, 6:40 AM

    Interesting. Before there even was attention I was thinking that the episodic memory model offers something that could be very useful for neural nets, so its cool to see people testing that
  • by MacsHeadroom on 5/10/25, 3:40 PM

    So, infinite context length by making it compute bound instead of memory bound. Curious how much longer this takes to run and when it makes sense to use vs RAG.