from Hacker News

Bridging the gap between keyword and semantic search with SPLADE (2024)

by softwaredoug on 5/5/25, 7:13 PM with 2 comments

  • by jbellis on 5/8/25, 3:24 PM

    I'm kind of disappointed in this article, Splade is a cool way to improve results of a TF/IDF index with minimally invasive changes and this obscures that more than it clarifies.

    > Next, my SPLADE implementation in Elasticsearch is oversimplified. If you scroll back up to get_splade_embedding, we extract non-zero elements from vec_np (the SPLADE tokens) but discard their associated weights. This is a missed opportunity. The SPLADE papers use these weights for scoring matches.

    Yes, exactly, that is the whole point of Splade.

    Probably easier to demonstrate if you drop down a level to Lucene, I don't think you will be able to do it easily with Elastic.

    Tangentially, I haven't looked closely at SPLATE which tries to marry Splade and ColBERT, but it's an interesting idea. https://arxiv.org/html/2404.13950v1