multi-hop QA benchmarks

Emerging

6papers using it

2025first seen

Multi-hop QA benchmarks are datasets used to evaluate the ability of models to answer questions that require reasoning over multiple pieces of evidence or information.

🔎 Find this dataset

Papers using multi-hop QA benchmarks (6)

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning2025

Think Big, Search Small: Where Capacity Matters in Hierarchical Search Agents?2026

Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals2026

Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search2026

Do Agents Need to Plan Step-by-Step? Rethinking Planning Horizon in Data-Centric Tool Calling2026

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning2026