by Joschkabraun on 12/4/23, 5:02 PM with 0 comments
This an interactive demo with a *somewhat* helpful AI assistant. The goal is to demonstrate a good way to reference-free evaluate interactions between humans and AI assistants. Reference-free means that you do not provide a correct answer to a query. The used metric in this context is the goal success ratio, which measures how many queries a user needs to send to reach their goal.
In the near future, there will be a guide on how to reference-free evaluate any LLM app (chat, RAG, summarization, etc.).
Try it out and please share any feedback!