Rar-b: Reasoning As Retrieval Benchmark
2024 Β· Chenghao Xiao, G Thomas Hudson, Noura Al Moubayed
Abstract
Semantic textual similartiy (STS) and information retrieval tasks (IR) tasks have been the two major avenues to record the progress of embedding models in the past few years. Under the emerging Retrieval-augmented Generation (RAG) paradigm, we envision the need to evaluate next-level language understanding abilities of embedding models, and take a conscious look at the reasoning abilities stored in them. Addressing this, we pose the question: Can retrievers solve reasoning problems? By transforming reasoning tasks into retrieval tasks, we find that without specifically trained for reasoning-level language understanding, current state-of-the-art retriever models may still be far from being competent for playing the role of assisting LLMs, especially in reasoning-intensive tasks. Moreover, albeit trained to be aware of instructions, instruction-aware IR models are often better off without instructions in inference time for reasoning tasks, posing an overlooked retriever-LLM behavioral ga
Authors
(none)
Tags
Stats
Related papers
- IRSC: A Zero-shot Evaluation Benchmark For Information Retrieval Through Semantic Comprehension In Retrieval-augmented Generation Scenarios (2024)2.86
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-intensive Benchmarks (2025)0.00
- Reasoning-augmented Representations For Multimodal Retrieval (2026)0.00
- Mr\(^2\)-bench: Going Beyond Matching To Reasoning In Multimodal Retrieval (2025)1.81
- Optimizing Retrieval For RAG Via Reinforcement Learning (2025)0.00
- TRACE: Task-adaptive Reasoning And Representation Learning For Universal Multimodal Retrieval (2026)0.00
- R2MED: A Benchmark For Reasoning-driven Medical Retrieval (2025)2.51
- Neurosymbolic Retrievers For Retrieval-augmented Generation (2026)0.00