HotpotQA
Emerging27papers using it
1,389HF downloads
16HF likes
2024first seen
Dataset Card for BEIR Benchmark hotpotqa is one of the datasets from the Question Answering task within BEIR, measuring Wikipedia article retrieval for a given multi-hop query. Dataset Summary BEIR is a heterogeneous benchmark built from 18 diverse datasets representing 9 information retrieval tasks. Fact-checking: FEV
π€ Hugging Faceβ cc-by-sa-4.0
Papers using HotpotQA (27)
- Retrieval as Reasoning: Self-Evolving Agent-Native Retrieval via LLM-WikiBenchmarking Prompt Sensitivity in Large Language ModelsKnowledge Graph-Guided Retrieval Augmented GenerationThe Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context ManagementH$^{2}$MT: Semantic Hierarchy-Aware Hierarchical Memory TransformerWhen Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level ValidationPersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agentsNAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG SystemsPersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM AgentsOne Token per Multimodal Evidence: Latent Memory for Resource-Constrained QACalibrating LLMs with Semantic-level RewardSearch, Do not Guess: Teaching Small Language Models to Be Effective Search AgentsCOMI: Coarse-to-fine Context Compression via Marginal Information GainMemSkill: Learning and Evolving Memory Skills for Self-Evolving AgentsNOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG SystemsReevaluating Self-Consistency Scaling in Multi-Agent SystemsA State-Update Prompting Strategy for Efficient and Robust Multi-turn DialogueEAPO: Enhancing Policy Optimization with On-Demand Expert AssistanceOpen Data Synthesis For Deep ResearchWikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language ModelsMacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAGUtility-Focused LLM Annotation for Retrieval and Retrieval-Augmented GenerationA Training-free LLM Framework with Interaction between Contextually
Related Subtasks in Solving Complex TasksTowards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary PerceptionEDGE: Efficient Data Selection for LLM Agents via Guideline
EffectivenessLLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction
From Large ContextsInference Scaling for Bridging Retrieval and Augmented Generation