from Hacker News

Faster semantic search with HNSW indexes in pgvector

by pashkinelfe on 9/7/23, 9:51 AM with 2 comments

  • by pashkinelfe on 9/7/23, 9:51 AM

    Pgvector 0.5.0 got HNSW indexes that is much faster for AI application than IVF in most cases. We measured pgvector performance in Supabase and present recommendations on its efficient usage.

    - HNSW preserves index quality even after massive table updates and doesn't need to be rebuilt

    - HNSW search is several times faster than IVF for high accuracy. For lower accuracy likely even more.

    - Pgvector/HNSW became faster than qdrant

    - IVFflat build time decreased two-times in 0.5.0

    - HNSW index could be built incrementally, you don't need to add all embeddings before index build

  • by egorr on 9/7/23, 10:11 AM

    hey hn, supabase engineer and blogpost coauthor here.

    we made our first experiments with HNSW index for vector search and noticed about 5 times better performance compared to IVF with high dimensional vectors, such as embedding-ada by OpenAI.

    we haven’t included results for smaller models like gte-small (384d) just yet, but we’re currently running those benchmarks as i write this comment. in our smoke tests the difference in performance isn’t as pronounced, but it still appears promising, suggesting that switching to HNSW could be beneficial for the majority of use cases.

    oh there have been improvements with IVF as well, with index building times decreasing by roughly half. So, if you’re considering sticking with IVF, it’s advisable to upgrade to the latest version for these enhancements.

    you can find extended version of ann testing framework in the GitHub [0]. its just the original one with a lil bit of code for pgvector and vecs lib.

    [0] https://github.com/egor-romanov/vector-db-benchmark/tree/fea...