Soft Prompt Decoding For Multilingual Dense Retrieval
2023 Β· Zhiqi Huang, Hansi Zeng, Hamed Zamani, et al.
Abstract
In this work, we explore a Multilingual Information Retrieval (MLIR) task, where the collection includes documents in multiple languages. We demonstrate that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance. This is due to the heterogeneous and imbalanced nature of multilingual collections -- some languages are better represented in the collection and some benefit from large-scale training data. To address this issue, we present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space. To address the challenges of data scarcity and imbalance, we introduce a knowledge distillation strategy. The teacher model is trained on rich English retrieval data, and by leveraging bi-text data, our distillation framework transfers its retrieval knowledge to the multilingual document encoder. Therefore, our
Authors
(none)
Tags
Stats
Related papers
- Soft Prompt Tuning For Augmenting Dense Retrieval With Large Language Models (2023)9.41
- Boosting Data Utilization For Multilingual Dense Retrieval (2025)0.00
- Translate-distill: Learning Cross-language Dense Retrieval By Translation And Distillation (2024)8.60
- Elevating All Zero-shot Sketch-based Image Retrieval Through Multimodal Prompt Learning (2024)6.34
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- SLQ: Bridging Modalities Via Shared Latent Queries For Retrieval With Frozen Mllms (2026)0.00
- C2KD: Cross-lingual Cross-modal Knowledge Distillation For Multilingual Text-video Retrieval (2022)8.94
- Context-adaptive Multi-prompt Embedding With Large Language Models For Vision-language Alignment (2025)0.00