Pre-train A Discriminative Text Encoder For Dense Retrieval Via Contrastive Span Prediction
2022 Β· Xinyu Ma, Jiafeng Guo, Ruqing Zhang, et al.
Abstract
Dense retrieval has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representation learning for effective search. Some recent studies have shown that autoencoder-based language models are able to boost the dense retrieval performance using a weak decoder. However, we argue that 1) it is not discriminative to decode all the input texts and, 2) even a weak decoder has the bypass effect on the encoder. Therefore, in this work, we introduce a novel contrastive span prediction task to pre-train the encoder alone, but still retain the bottleneck ability of the autoencoder. % Therefore, in this work, we propose to drop out the decoder and introduce a novel contrastive span prediction task to pre-train the encoder alone. The key idea is to force the encoder to generate the text representation close to its own random spans while far away from others using a group-wise contrastive loss. In this way, we can 1) learn discriminative te
Authors
(none)
Tags
Stats
Related papers
- Less Is More: Pre-train A Strong Text Encoder For Dense Retrieval Using A Weak Decoder (2021)14.29
- Drop Your Decoder: Pre-training With Bag-of-word Prediction For Dense Passage Retrieval (2024)3.58
- Condenser: A Pre-training Architecture For Dense Retrieval (2021)14.90
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00
- Challenging Decoder Helps In Masked Auto-encoder Pre-training For Dense Passage Retrieval (2023)0.00
- Interpret And Control Dense Retrieval With Sparse Latent Features (2024)2.26
- Analysing The Robustness Of Dual Encoders For Dense Retrieval Against Misspellings (2022)9.59
- Text And Code Embeddings By Contrastive Pre-training (2022)0.00