C3: Continued Pretraining With Contrastive Weak Supervision For Cross Language Ad-hoc Retrieval
2022 Β· Eugene Yang, Suraj Nair, Ramraj Chandradevan, et al.
Abstract
Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.
Authors
(none)
Tags
Stats
Related papers
- Understanding Retrieval-augmented Task Adaptation For Vision-language Models (2024)0.00
- Improving The Consistency In Cross-lingual Cross-modal Retrieval With 1-to-k Contrastive Learning (2024)5.84
- Pre-training For Ad-hoc Retrieval: Hyperlink Is Also You Need (2021)10.35
- Unsupervised Context Aware Sentence Representation Pretraining For Multi-lingual Dense Retrieval (2022)3.58
- On Cross-lingual Retrieval With Multilingual Text Encoders (2021)10.35
- Boosting Zero-shot Cross-lingual Retrieval By Training On Artificially Code-switched Data (2023)4.52
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00
- Boosting Data Utilization For Multilingual Dense Retrieval (2025)0.00