← all datasets

multi-hop QA benchmarks

Emerging
5papers using it
2025first seen

Multi-hop QA benchmarks are datasets used to evaluate the ability of models to perform multi-step reasoning by requiring them to gather and synthesize information from multiple sources to answer complex questions.

Papers using multi-hop QA benchmarks (5)

multi-hop QA benchmarks β€” datasets β€” ai-agents