BEIR: A Heterogenous Benchmark For Zero-shot Evaluation Of Information Retrieval Models
2021 · Nandan Thakur, Nils Reimers, Andreas Rücklé, et al.
Abstract
Existing neural information retrieval (IR) models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their out-of-distribution (OOD) generalization capabilities. To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval. We leverage a careful selection of 18 publicly available datasets from diverse text retrieval tasks and domains and evaluate 10 state-of-the-art retrieval systems including lexical, sparse, dense, late-interaction and re-ranking architectures on the BEIR benchmark. Our results show BM25 is a robust baseline and re-ranking and late-interaction-based models on average achieve the best zero-shot performances, however, at high computational costs. In contrast, dense and sparse-retrieval models are computationally more efficient but often underperform other approaches,
Authors
(none)
Tags
Stats
Related papers
- Resources For Brewing BEIR: Reproducible Reference Models And An Official Leaderboard (2023)0.00
- Uniir: Training And Benchmarking Universal Multimodal Information Retrievers (2023)10.48
- Systematic Evaluation Of Neural Retrieval Models On The Touch\'e 2020 Argument Retrieval Subset Of BEIR (2024)9.31
- Benchmarking And Building Zero-shot Hindi Retrieval Model With Hindi-beir And NLLB-E5 (2024)0.00
- MAIR: A Massive Benchmark For Evaluating Instructed Retrieval (2024)6.41
- Hindi-beir : A Large Scale Retrieval Benchmark In Hindi (2024)0.00
- IRSC: A Zero-shot Evaluation Benchmark For Information Retrieval Through Semantic Comprehension In Retrieval-augmented Generation Scenarios (2024)2.86
- A Deep Look Into Neural Ranking Models For Information Retrieval (2019)17.73