Coverage, Not Averages: Semantic Stratification For Trustworthy Retrieval Evaluation
2026 Β· Andrew Klearman, Radu Revutchi, Rohin Garg, et al.
Abstract
Retrieval quality is the primary bottleneck for accuracy and robustness in retrieval-augmented generation (RAG). Current evaluation relies on heuristically constructed query sets, which introduce a hidden intrinsic bias. We formalize retrieval evaluation as a statistical estimation problem, showing that metric reliability is fundamentally limited by the evaluation-set construction. We further introduce *semantic stratification*, which grounds evaluation in corpus structure by organizing documents into an interpretable global space of entity-based clusters and systematically generating queries for missing strata. This yields (1) formal semantic coverage guarantees across retrieval regimes and (2) interpretable visibility into retrieval failure modes. Experiments across multiple benchmarks and retrieval methods validate our framework. The results expose systematic coverage gaps, identify structural signals that explain variance in retrieval performance, and show that stratified evaluat
Authors
(none)
Tags
Stats
Related papers
- From BM25 To Corrective RAG: Benchmarking Retrieval Strategies For Text-and-table Documents (2026)0.00
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- With Argus Eyes: Assessing Retrieval Gaps Via Uncertainty Scoring To Detect And Remedy Retrieval Blind Spots (2026)0.00
- Mor: Better Handling Diverse Queries With A Mixture Of Sparse, Dense, And Human Retrievers (2025)2.26
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-intensive Benchmarks (2025)0.00
- Ragsmith: A Framework For Finding The Optimal Composition Of Retrieval-augmented Generation Methods Across Datasets (2025)0.00
- IRSC: A Zero-shot Evaluation Benchmark For Information Retrieval Through Semantic Comprehension In Retrieval-augmented Generation Scenarios (2024)2.86
- Slimrag: Retrieval Without Graphs Via Entity-aware Context Selection (2025)1.91