Revela: Dense Retriever Learning Via Language Modeling
2025 Β· Fengyu Cai, Tong Chen, Xinran Zhao, et al.
Abstract
Dense retrievers play a vital role in accessing external and specialized knowledge to augment language models (LMs). Training dense retrievers typically requires annotated query-document pairs, which are costly to create and scarce in specialized domains (e.g., code) or in complex settings (e.g., requiring reasoning). These practical challenges have sparked growing interest in self-supervised retriever learning. Since LMs are trained to capture token-level dependencies through a self-supervised learning objective (i.e., next token prediction), we can analogously cast retrieval as learning dependencies among chunks of tokens. This analogy naturally leads to the question: How can we adapt self-supervised learning objectives in the spirit of language modeling to train retrievers? To answer this question, we introduce Revela, a unified and scalable training framework for self-supervised retriever learning via language modeling. Revela models semantic dependencies among documents by condi
Authors
(none)
Tags
Stats
Related papers
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- Freeret: Mllms As Training-free Retrievers (2025)0.00
- Making Large Language Models Efficient Dense Retrievers (2025)0.00
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- REVEAL: Retrieval-augmented Visual-language Pre-training With Multi-source Multimodal Knowledge Memory (2022)13.65