Improving Cross-lingual Information Retrieval On Low-resource Languages Via Optimal Transport Distillation
2023 Β· Zhiqi Huang, Puxuan Yu, James Allan
Abstract
Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval models. However, due to unbalanced pre-training data in different languages, multilingual language models have already shown a performance gap between high and low-resource languages in many downstream tasks. And cross-lingual retrieval models built on such pre-trained models can inherit language bias, leading to suboptimal result for low-resource languages. Moreover, unlike the English-to-English retrieval task, where large-scale training collections for document ranking such as MS MARCO are available, the lack of cross-lingual retrieval data for low-resource language makes it more challenging for training cross-lingual retrieval models. In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual in
Authors
(none)
Tags
Stats
Related papers
- Translate-distill: Learning Cross-language Dense Retrieval By Translation And Distillation (2024)8.60
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00
- Transfer Learning Approaches For Building Cross-language Dense Retrieval Models (2022)10.97
- Boosting Zero-shot Cross-lingual Retrieval By Training On Artificially Code-switched Data (2023)4.52
- Dual-view Curricular Optimal Transport For Cross-lingual Cross-modal Retrieval (2023)9.03
- Boosting Data Utilization For Multilingual Dense Retrieval (2025)0.00
- Parameter-efficient Neural Reranking For Cross-lingual And Multilingual Retrieval (2022)0.00
- Evaluating Multilingual Text Encoders For Unsupervised Cross-lingual Retrieval (2021)7.50