← all datasets

HotpotQA

Emerging

27papers using it

1,389HF downloads

16HF likes

2024first seen

Dataset Card for BEIR Benchmark hotpotqa is one of the datasets from the Question Answering task within BEIR, measuring Wikipedia article retrieval for a given multi-hop query. Dataset Summary BEIR is a heterogeneous benchmark built from 18 diverse datasets representing 9 information retrieval tasks. Fact-checking: FEV

🤗 Hugging Face⚖ cc-by-sa-4.0

Papers using HotpotQA (27)

Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-Wiki2026

Benchmarking Prompt Sensitivity in Large Language Models2025 · 2 cites

Knowledge Graph-Guided Retrieval Augmented Generation2025 · 2 cites

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management2026

H$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory Transformer2026

When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation2026

PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents2026

NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems2026

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents2026

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA2026

Calibrating LLMs with Semantic-level Reward2026

Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents2026

COMI: Coarse-to-fine Context Compression via Marginal Information Gain2026

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents2026

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems2026

Reevaluating Self-Consistency Scaling in Multi-Agent Systems2025

A State-Update Prompting Strategy for Efficient and Robust Multi-turn Dialogue2025

EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance2025

Open Data Synthesis For Deep Research2025

Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models2025

MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG2025

Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation2025

A Training-free LLM Framework with Interaction between Contextually Related Subtasks in Solving Complex Tasks2025

Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception2025

EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness2025

LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction From Large Contexts2025

Inference Scaling for Bridging Retrieval and Augmented Generation2024

HotpotQA — datasets — llm-papers