by adg29 on 10/1/24, 11:56 PM with 1 comments
by adg29 on 10/1/24, 11:56 PM
Dataset Overview 824 challenging multi-hop questions requiring information from 2-15 Wikipedia articles Questions span diverse topics including history, sports, science, animals, health, etc. Each question is labeled with reasoning types: numerical, tabular, multiple constraints, temporal, and post-processing Gold answers and relevant Wikipedia articles provided for each question
Key Features Tests end-to-end RAG capabilities in a unified framework Requires integration of information from multiple sources Incorporates complex reasoning and temporal disambiguation Designed to be challenging for state-of-the-art language models
Usage This dataset can be used to:
Evaluate RAG system performance Benchmark language model factuality and reasoning Develop and test multi-hop retrieval strategies