multi-hop QA benchmarks
Emerging5papers using it
2025first seen
Multi-hop QA benchmarks are datasets used to evaluate the ability of models to perform multi-step reasoning by requiring them to gather and synthesize information from multiple sources to answer complex questions.
Papers using multi-hop QA benchmarks (5)
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement LearningDivide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning SignalsBeyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic SearchDo Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool CallingStepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning