Dense Text Retrieval Based On Pretrained Language Models: A Survey
2022 Β· Wayne Xin Zhao, Jing Liu, Ruiyang Ren, et al.
Abstract
Text retrieval is a long-standing research topic on information seeking, where a system is required to return relevant information resources to user's queries in natural language. From classic retrieval methods to learning-based ranking functions, the underlying retrieval models have been continually evolved with the ever-lasting technical innovation. To design effective retrieval models, a key point lies in how to learn the text representation and model the relevance matching. The recent success of pretrained language models (PLMs) sheds light on developing more capable text retrieval approaches by leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can effectively learn the representations of queries and texts in the latent representation space, and further construct the semantic matching function between the dense vectors for relevance modeling. Such a retrieval approach is referred to as dense retrieval, since it employs dense vectors (a.k.a., embeddings) to
Authors
(none)
Tags
Stats
Related papers
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Pre-training With Aspect-content Text Mutual Prediction For Multi-aspect Dense Retrieval (2023)5.24
- Pre-training Vs. Fine-tuning: A Reproducibility Study On Dense Retrieval Knowledge Acquisition (2025)0.95
- Condenser: A Pre-training Architecture For Dense Retrieval (2021)14.90
- Laprador: Unsupervised Pretrained Dense Retriever For Zero-shot Text Retrieval (2022)8.82
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Sparse And Dense Retrievers Learn Better Together: Joint Sparse-dense Optimization For Text-image Retrieval (2025)0.00
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00