On Cross-lingual Retrieval With Multilingual Text Encoders
2021 Β· Robert Litschko, Ivan VuliΔ, Simone Paolo Ponzetto, et al.
Abstract
In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs. We first treat these models as multilingual text encoders and benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained multilingual encoders on average fail to significantly outperform earlier models based on CLWEs. For sentence-level retrieval, we do obtain state-of-the-art performance: the peak scores, however, are met by multilingual encoders that have been further specialized, in a supervised fashion, for sentence understanding tasks, rather than using their vanilla 'off-the-shelf' variants. Following these results, we introduce localized rel
Authors
(none)
Tags
Stats
Related papers
- Evaluating Multilingual Text Encoders For Unsupervised Cross-lingual Retrieval (2021)7.50
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00
- Bridging Language Gaps: Advances In Cross-lingual Information Retrieval With Multilingual Llms (2025)0.00
- Boosting Zero-shot Cross-lingual Retrieval By Training On Artificially Code-switched Data (2023)4.52
- Translate-distill: Learning Cross-language Dense Retrieval By Translation And Distillation (2024)8.60
- Transforming Llms Into Cross-modal And Cross-lingual Retrieval Systems (2024)4.52
- Transfer Learning Approaches For Building Cross-language Dense Retrieval Models (2022)10.97
- Massively Multilingual Sentence Embeddings For Zero-shot Cross-lingual Transfer And Beyond (2018)26.33