Augmenting Passage Representations With Query Generation For Enhanced Cross-lingual Dense Retrieval
2023 Β· Shengyao Zhuang, Linjun Shou, Guido Zuccon
Abstract
Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task. However, cross-lingual data for training is often scarcely available. In this paper, rather than using more cross-lingual data for training, we propose to use cross-lingual query generation to augment passage representations with queries in languages other than the original passage language. These augmented representations are used at inference time so that the representation can encode more information across the different target languages. Training of a cross-lingual query generator does not require additional training data to that used for the dense retriever. The query generator training is also effective because the pre-training task for the generator (T5 text-to-text training) is very similar to the fine-tuning task (generation of a query). The use of the generator does
Authors
(none)
Tags
Stats
Related papers
- Empowering Dual-encoder With Query Generator For Cross-lingual Dense Retrieval (2023)6.34
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Query-as-context Pre-training For Dense Passage Retrieval (2022)7.68
- Dense Passage Retrieval: Is It Retrieving? (2024)6.34
- Bridging Language Gaps: Advances In Cross-lingual Information Retrieval With Multilingual Llms (2025)0.00
- Don't Retrieve, Generate: Prompting Llms For Synthetic Training Data In Dense Retrieval (2025)0.00
- Embedding-based Zero-shot Retrieval Through Query Generation (2020)0.00
- Translate-distill: Learning Cross-language Dense Retrieval By Translation And Distillation (2024)8.60