← all datasets

HotpotQA

Emerging

39papers using it

2023first seen

Dataset Card for BEIR Benchmark hotpotqa is one of the datasets from the Question Answering task within BEIR, measuring Wikipedia article retrieval for a given multi-hop query. Dataset Summary BEIR is a heterogeneous benchmark built from 18 diverse datasets representing 9 information retrieval tasks. Fact-checking: FEV

🔎 Find this dataset

Papers using HotpotQA (29)

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory2026

Tool-Schema Compression Enables Agentic RAG Under Constrained Context Budgets2026

KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents2024 · 4 cites

CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs2025 · 4 cites

Bridge Evidence: Static Retrieval Utility Does Not Predict Causal Utility in Multi-Step Agentic Search2026

Track, Rank, Crack: Epistemic Working Memory Scales Multi-Hop Reasoning in Language Agents2026

MemPro: Agentic Memory Systems as Evolvable Programs2026

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation2026

AdaMEM: Test-Time Adaptive Memory for Language Agents2026

Semantic Early-Stopping for Iterative LLM Agent Loops2026

When Latent Agents Lie: KV-Cache Integrity in Multi-Agent LLM Collaboration2026

Contrastive Reflection for Iterative Prompt Optimization2026

ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration2026

Parallel Context Compaction for Long-Horizon LLM Agent Serving2026

Proper Scoring Rules for Agentic Uncertainty Quantification2026

Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki2026

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning2026

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement2026

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback2026

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems2026

GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering2026

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents2026

Scaling Multi-agent Systems: A Smart Middleware for Improving Agent Interactions2026

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents2026

PseudoAct: Leveraging Pseudocode Synthesis for Flexible Planning and Action Control in Large Language Model Agents2026

MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs2025

Mission Impossible: Feedback-Guided Dynamic Interactive Planning for Improving Reasoning on LLMs2025

DebFlow: Automating Agent Creation via Agent Debate2025

Smurfs: Multi-agent System Using Context-efficient DFSDT For Tool Planning2024

HotpotQA dataset — papers, benchmarks & downloads · AI Agents