by johnjwang on 5/30/24, 3:17 PM with 57 comments
by pamelafox on 5/30/24, 7:56 PM
https://github.com/Azure-Samples/rag-postgres-openai-python/
Here's the RRF+Hybrid part: https://github.com/Azure-Samples/rag-postgres-openai-python/...
That's largely based off a sample from the pgvector repo, with a few tweaks.
Agreed that Hybrid is the way to go, it's what the Azure AI Search team also recommends, based off their research:
https://techcommunity.microsoft.com/t5/ai-azure-ai-services-...
by thefourthchime on 5/30/24, 9:48 PM
At first, I downloaded entire channels, loaded them into a vector DB, and did RAG. The results sucked. Vector searches don't understand things very well, and in this world, specific keywords and error messages are very searchable.
Instead, I take the user's query, ask an LLM (Claude / Bedrock) to find keywords, then search Slack using the API, get results, and use an LLM to filter for discussions that are relevant, then summarize them all in a response.
This is slow, of course, so it's very multi-threaded. A typical response will be within 30 seconds.
by edude03 on 5/30/24, 4:07 PM
I was however tripped up by this sentence close to the beginning:
> we encountered a significant challenge with RAG: relying solely on vector search (even using both dense and sparse vectors) doesn’t always deliver satisfactory results for certain queries.
Not to be overly pedantic, but that's a problem with vector similarity, not RAG as a concept.
Although the author is clearly aware of that - I have had numerous conversations in the past few months alone of people essentially saying "RAG doesn't work because I use pg_vector (or whatever) and it never finds what I'm looking for" not realizing 1) it's not the only way to do RAG, and 2) there is often a fair difference between the embeddings and the vectorized query, and with awareness of why that is you can figure out how to fix it.
https://medium.com/@cdg2718/why-your-rag-doesnt-work-9755726... basically says everything I often say to people with RAG/vector search problems but again, seems like the assembled team has it handled :)
by eskibars on 5/31/24, 11:52 AM
The problem is that most people don't have experience optimizing even 1 of the retrieval systems (vector or keyword), so a lot of users that try to DIY build end up with an awful time trying to get to prod. People are talking about things like RRF (which are needed) but then missing other big-picture things like the mistakes everyone makes when building out a keyword search (not getting the right language rules in place) and also not getting the right vector side (finding the right embedding models, chunking strategies, etc).
I recognize I have a bit of a conflict of interest since I'm at a RAG vendor, but I'll abstain from the name/self-promotion and say: I've seen so many cases where people get this wrong, if you're thinking RAG you really should be hiring a consultant or looking at a complete platform from people that have done it more. Or be prepared to spend a lot of cycles learning and iterating
by pmc00 on 5/30/24, 8:18 PM
We also included supporting data in that write up showing you can improve significantly on top of Hybrid/RRF using a reranking stage (assuming you have a good reranker model), so we shipped one as an optional step as part of our search engine.
by cheesyFish on 5/30/24, 9:40 PM
LlamaIndex has a module for exactly this
https://docs.llamaindex.ai/en/stable/examples/retrievers/rel...
by yingfeng on 5/31/24, 2:09 AM
On the other hand let me introduce another database we developed, Infinity(https://github.com/infiniflow/infinity), which can provide the hybrid search, you can see the performance here(https://github.com/infiniflow/infinity/blob/main/docs/refere...), both vector search and full-text search could perform much faster than other open source alternatives.
From the next version(weeks later), Infinity will also provide more comprehensive hybrid search capabilities, what you have mentioned the 3-way recalls(dense vector, sparse vector, keyword search) could be provided within single request.
by retakeming on 5/30/24, 7:33 PM
by throwaway115 on 5/30/24, 6:41 PM
by janalsncm on 5/30/24, 10:21 PM
There are a couple ways around this. Either learning the relative importance based on the query, and/or using a separate reranking function (usually a DNN) that also takes user behavior into account.
by ko_pivot on 5/30/24, 8:52 PM
by gregnr on 5/30/24, 8:36 PM
(disclaimer: supabase dev who went down the rabbit hole with hybrid search)
by SomewhatLikely on 5/30/24, 5:46 PM
So I'm not sure why the article uses 1/Rank alone. Did you test both and find that the smoothing didn't help? My understanding is that it has been pretty important for the best results.
by owen-elliott on 5/31/24, 6:08 AM
While it's not such a problem in RAG, one downside is that it complicates pagination for results (there are a few different ways to tackle this).
by cricketlover on 5/31/24, 6:13 AM
> Out-of-sync document stores could lead to subtle bugs, such as a document being present in one store but not another.
But then the article suggests to upload synchronously in S3/DDB and then sync asynchronously to actual document stores. How does this solve out of sync issue? It doesn't. It can't be solved is what I'm thinking.
> Data, numbers
How much data are we talking about?
by marcyb5st on 5/31/24, 3:57 PM
Additionally, if you can add conditional fuzzy matching into the mix so fat fingering something still yields a workable result is even better for UX (something along the lines of "the results from the tf-idf search are garbage, let's redo the search with fuzzy matching this time).
by _pdp_ on 5/31/24, 10:51 AM
by treprinum on 5/30/24, 8:24 PM
by mtbarta3 on 5/30/24, 8:45 PM
The tradeoffs of using existing systems vs building your own resonate with me. What we eventually experienced, however, is that periods of bad search performance often correlated to out-of-date search indices.
I'd be interested in another article detailing how you monitor search. It can be tricky to keep an entire search system moving.
by esafak on 5/30/24, 6:51 PM
2. If anyone is observing significant gains from incorporating knowledge graphs into the retrieval step, what kind of a knowledge graph are you working with, what is your retrieval algorithm, and what technology are you using to store it?
by ndricca on 5/30/24, 8:20 PM
by cpursley on 5/30/24, 6:30 PM
by armind on 5/31/24, 10:16 AM
by yding on 5/30/24, 5:07 PM