Challenging Decoder Helps In Masked Auto-encoder Pre-training For Dense Passage Retrieval
2023 Β· Zehan Li, Yanzhao Zhang, Dingkun Long, et al.
Abstract
Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a ``more demanding'' decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be implemented in an unsupervised manner, without adding additional expenses to the pre-training phase. Our expe
Authors
(none)
Tags
Stats
Related papers
- Drop Your Decoder: Pre-training With Bag-of-word Prediction For Dense Passage Retrieval (2024)3.58
- Cot-mae V2: Contextual Masked Auto-encoder With Multi-view Modeling For Passage Retrieval (2023)0.00
- MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders Are Better Dense Retrievers (2022)9.97
- Cot-mote: Exploring Contextual Masked Auto-encoder Pre-training With Mixture-of-textual-experts For Passage Retrieval (2023)0.00
- Less Is More: Pre-train A Strong Text Encoder For Dense Retrieval Using A Weak Decoder (2021)14.29
- Lexmae: Lexicon-bottlenecked Pretraining For Large-scale Retrieval (2022)0.00
- Pre-train A Discriminative Text Encoder For Dense Retrieval Via Contrastive Span Prediction (2022)10.21
- VLMAE: Vision-language Masked Autoencoder (2022)0.00