from Hacker News

RAG Is Laughably Simple

by josvdwest on 8/20/24, 11:05 PM with 2 comments

When I first heard about RAG (Retrieval-Augment Generation) I thought it’s some sophisticated architecture injection change to LLMs that allows you to add more information to the different nodes and attention heads of an LLM. Shortly after, I learned that it’s actually laughably simple. I’m surprised it has its own separate name!

It boils down to 5 steps: 1. Create a representation of all the possible information (text) you’d like to be considered for your question. [info-representation]

2. Create a representation of the question being asked. [question-representation]

3. Find the top N info-representations most similar to your question-representation.

4. Feed all of the information (text) from the top N representations into your LLM of choice (e.g. OpenAI GPT4o) along with the question.

5. And Voila! Your model will give you an answer given the context you’ve added.

It could almost be called “Expand your LLM prompt with more context”.

by PaulHoule on 8/20/24, 11:28 PM
It’s not too different from you yourself searching for a few documents and reading them before formulating an answer.
by curious_curios on 8/20/24, 11:47 PM
You’re glossing over a lot of details here, which is where most of the pain is.
Properly chunking the data, handling non-standard text formatting in source documents, not even having OCR’d text in source documents, having disparate indexes available per client, minimizing hallucinations even with properly context data, and more.