IRSC: A Zero-shot Evaluation Benchmark For Information Retrieval Through Semantic Comprehension In Retrieval-augmented Generation Scenarios
2024 Β· Hai Lin, Shaoxiong Zhan, Junyou Su, et al.
Abstract
In Retrieval-Augmented Generation (RAG) tasks using Large Language Models (LLMs), the quality of retrieved information is critical to the final output. This paper introduces the IRSC benchmark for evaluating the performance of embedding models in multilingual RAG tasks. The benchmark encompasses five retrieval tasks: query retrieval, title retrieval, part-of-paragraph retrieval, keyword retrieval, and summary retrieval. Our research addresses the current lack of comprehensive testing and effective comparison methods for embedding models in RAG scenarios. We introduced new metrics: the Similarity of Semantic Comprehension Index (SSCI) and the Retrieval Capability Contest Index (RCCI), and evaluated models such as Snowflake-Arctic, BGE, GTE, and M3E. Our contributions include: 1) the IRSC benchmark, 2) the SSCI and RCCI metrics, and 3) insights into the cross-lingual limitations of embedding models. The IRSC benchmark aims to enhance the understanding and development of accurate retrieva
Authors
(none)
Tags
Stats
Related papers
- Rar-b: Reasoning As Retrieval Benchmark (2024)2.68
- BEIR: A Heterogenous Benchmark For Zero-shot Evaluation Of Information Retrieval Models (2021)6.67
- Evaluating Embedding Apis For Information Retrieval (2023)8.09
- Visual-rag: Benchmarking Text-to-image Retrieval Augmented Generation For Visual Knowledge Intensive Queries (2025)0.00
- Beyond Benchmarks: Evaluating Embedding Model Similarity For Retrieval Augmented Generation Systems (2024)0.00
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Visr-bench: An Empirical Study On Visual Retrieval-augmented Generation For Multilingual Long Document Understanding (2025)0.00
- Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025)0.00