Evaluating Multilingual Text Encoders For Unsupervised Cross-lingual Retrieval
2021 Β· Robert Litschko, Ivan VuliΔ, Simone Paolo Ponzetto, et al.
Abstract
Pretrained multilingual text encoders based on neural Transformer architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong performance on a myriad of language understanding tasks. Consequently, they have been adopted as a go-to paradigm for multilingual and cross-lingual representation learning and transfer, rendering cross-lingual word embeddings (CLWEs) effectively obsolete. However, questions remain to which extent this finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual IR (CLIR) tasks. Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained encoders fail to significa
Authors
(none)
Tags
Stats
Related papers
- On Cross-lingual Retrieval With Multilingual Text Encoders (2021)10.35
- Transfer Learning Approaches For Building Cross-language Dense Retrieval Models (2022)10.97
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00
- CL2CM: Improving Cross-lingual Cross-modal Retrieval Via Cross-lingual Knowledge Transfer (2023)8.60
- Translate-distill: Learning Cross-language Dense Retrieval By Translation And Distillation (2024)8.60
- Bridging Language Gaps: Advances In Cross-lingual Information Retrieval With Multilingual Llms (2025)0.00
- Massively Multilingual Sentence Embeddings For Zero-shot Cross-lingual Transfer And Beyond (2018)26.33
- Boosting Zero-shot Cross-lingual Retrieval By Training On Artificially Code-switched Data (2023)4.52