from Hacker News

Show HN: FiddleCube – Generate Q&A to test your LLM

by kaushik92 on 6/25/24, 5:26 PM with 18 comments

Convert your vector embeddings into a set of questions and their ideal responses. Use this dataset to test your LLM and catch failures caused by prompt or RAG updates.

Get started in 3 lines of code:

```

pip3 install fiddlecube

```

```

from fiddlecube import FiddleCube

fc = FiddleCube(api_key="<api-key>") dataset = fc.generate( [ "The cat did not want to be petted.", "The cat was not happy with the owner's behavior.", ], 10, ) dataset

```

Generate your API key: https://dashboard.fiddlecube.ai/api-key

# Ideal QnA datasets for testing, eval and training LLMs

Testing, evaluation or training LLMs requires an ideal QnA dataset aka the golden dataset.

This dataset needs to be diverse, covering a wide range of queries with accurate responses.

Creating such a dataset takes significant manual effort.

As the prompt or RAG contexts are updated, which is nearly all the time for early applications, the dataset needs to be updated to match.

# FiddleCube generates ideal QnA from vector embeddings

- The questions cover the entire RAG knowledge corpus.

- Complex reasoning, safety alignment and 5 other question types are generated.

- Filtered for correctness, context relevance and style.

- Auto-updated with prompt and RAG updates.

  • by Loic on 6/26/24, 10:16 AM

    For the people wondering, the Github repo is only hosting a couple of lines of Python to connect to their API.

    If you have your own LLM, you may have sensitive/private data "in" it from your training. You may not be allowed to use this service from a legal point of view.

  • by mistercow on 6/26/24, 1:26 PM

    The bulleted list of what constitutes “ideal” is missing one of the most important types of questions: questions that aren’t answered by the knowledge set, but which seem like they should/might be.

    This is where RAG systems consistently fall down. The end user, by definition, doesn’t know what you’ve got in your data. They won’t ask questions carefully cherry-picked from it. They’ll ask questions they need to know the answer to, and more often than you think, those answers won’t be in your data. You absolutely must know how your system behaves when they do that.

  • by johnsutor on 6/25/24, 6:40 PM

    How does this differ from Ragas? https://docs.ragas.io/en/latest/index.html
  • by cruxcode on 6/25/24, 5:58 PM

    Can it generate HTML as part of prompt?
  • by praveenkumarnew on 6/25/24, 10:16 PM

    Can I plug this into ragas pipeline
  • by aditikothari on 6/25/24, 7:48 PM

    This is super cool!
  • by arjun9642 on 6/26/24, 12:11 AM

    I want to hack