from Hacker News

Vector DB with no network latency – SQLite

by elamje on 7/6/23, 6:26 PM with 5 comments

  • by pedrovhb on 7/6/23, 8:43 PM

    I see the appeal and I'd totally consider using SQLite as a vector store with the proper extensions/support (I'd imagine this exists; does it?), but the code shown there really isn't an apples to apples comparison, is it? Every query fetches all vectors, deserializes each from JSON, allocates memory, and instantiates them as numpy arrays, and then proceeds to do an O(n) search for cosine similarity on embeddings which in the example aren't normalized. At this point, network latency for a (presumably loopback) grpc call isn't what I'd be concerned with. There's really no reason to use SQLite at all in this case, just keep everything in memory and save state to disk if that's what you care about.
  • by nickpeterson on 7/6/23, 11:04 PM

    Wouldn’t the competitor here be DuckDB? I feel like you might have to implement the similarity function but it would have to perform vector operations faster (one would hope).
  • by scotty79 on 7/6/23, 9:47 PM

    Why sqlite? You can write an array to a file directly. They read whole dataset each time they want to find the closest match.
  • by jerrygenser on 7/7/23, 12:08 AM

    You can also use tools like annoy or nmslib/hnsw which are built for this purpose -- store vectors on disk and do high performance similarity search against them.

    Off topic - I'm pleasantly surprised that I could see this Twitter post without being logged in.