Cot-mae V2: Contextual Masked Auto-encoder With Multi-view Modeling For Passage Retrieval
2023 Β· Xing Wu, Guangyuan Ma, Peng Wang, et al.
Abstract
Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the reconstruction of passages. However, it only uses a single auto-encoding pre-task for dense representation pre-training. This study brings multi-view modeling to the contextual masked auto-encoder. Firstly, multi-view representation utilizes both dense and sparse vectors as multi-view representations, aiming to capture sentence semantics from different aspects. Moreover, multiview decoding paradigm utilizes both autoencoding and auto-regressive decoders in representation bottleneck pre-training, aiming to provide both reconstructive and generative signals for better contextual representation pretraining. We refer to this multi-view pretraining method as CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective and robust on large-scal
Authors
(none)
Tags
Stats
Related papers
- Cot-mote: Exploring Contextual Masked Auto-encoder Pre-training With Mixture-of-textual-experts For Passage Retrieval (2023)0.00
- Challenging Decoder Helps In Masked Auto-encoder Pre-training For Dense Passage Retrieval (2023)0.00
- Drop Your Decoder: Pre-training With Bag-of-word Prediction For Dense Passage Retrieval (2024)3.58
- MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders Are Better Dense Retrievers (2022)9.97
- Contrastive Audio-visual Masked Autoencoder (2022)4.93
- Query-as-context Pre-training For Dense Passage Retrieval (2022)7.68
- Investigating Multi-layer Representations For Dense Passage Retrieval (2025)0.00
- Noise-robust Dense Retrieval Via Contrastive Alignment Post Training (2023)0.00