from Hacker News

Show HN: Reference-free evaluation of LLM-powered chatbots

by Joschkabraun on 12/4/23, 5:02 PM with 0 comments

Hey HN!

This an interactive demo with a *somewhat* helpful AI assistant. The goal is to demonstrate a good way to reference-free evaluate interactions between humans and AI assistants. Reference-free means that you do not provide a correct answer to a query. The used metric in this context is the goal success ratio, which measures how many queries a user needs to send to reach their goal.

In the near future, there will be a guide on how to reference-free evaluate any LLM app (chat, RAG, summarization, etc.).

Try it out and please share any feedback!