Posir: Position-aware Heterogeneous Information Retrieval Benchmark
2026 Β· Ziyang Zeng, Dun Zhang, Yu Yan, et al.
Abstract
In real-world documents, the information relevant to a user query may reside anywhere from the beginning to the end. This makes position bias -- a systematic tendency of retrieval models to favor or neglect content based on its location -- a critical concern. Although recent studies have identified such bias, existing analyses focus predominantly on English, fail to disentangle document length from information position, and lack a standardized framework for systematic diagnosis. To address these limitations, we introduce PosIR (Position-Aware Information Retrieval), the first standardized benchmark designed to systematically diagnose position bias in diverse retrieval scenarios. PosIR comprises 310 datasets spanning 10 languages and 31 domains, with relevance tied to precise reference spans. At its methodological core, PosIR employs a length-controlled bucketing strategy that groups queries by positive document length and analyzes positional effects within each bucket. This design stri
Authors
(none)
Tags
Stats
Related papers
- An Empirical Study Of Position Bias In Modern Information Retrieval (2025)1.69
- BEIR: A Heterogenous Benchmark For Zero-shot Evaluation Of Information Retrieval Models (2021)6.67
- Quantifying Positional Biases In Text Embedding Models (2024)0.00
- Uniir: Training And Benchmarking Universal Multimodal Information Retrievers (2023)10.48
- Visr-bench: An Empirical Study On Visual Retrieval-augmented Generation For Multilingual Long Document Understanding (2025)0.00
- Debiasing Gender Bias In Information Retrieval Models (2022)0.00
- Resources For Brewing BEIR: Reproducible Reference Models And An Official Leaderboard (2023)0.00
- Neuclirbench: A Modern Evaluation Collection For Monolingual, Cross-language, And Multilingual Information Retrieval (2025)0.00