Improving The Consistency In Cross-lingual Cross-modal Retrieval With 1-to-k Contrastive Learning
2024 Β· Zhijie Nie, Richong Zhang, Zhangchi Feng, et al.
Abstract
Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on large-scale data have significantly improved retrieval tasks. However, these methods directly follow the existing pre-training methods in the cross-lingual or cross-modal domain, leading to two problems of inconsistency in CCR: The methods with cross-lingual style suffer from the intra-modal error propagation, resulting in inconsistent recall performance across languages in the whole dataset. The methods with cross-modal style suffer from the inter-modal optimization direction bias, resulting in inconsistent rank across languages within each instance, which cannot be reflected by Recall@K. To s
Authors
(none)
Tags
Stats
Related papers
- Generalized Contrastive Learning For Universal Multimodal Retrieval (2025)0.00
- Covlr: Coordinating Cross-modal Consistency And Intra-modal Structure For Vision-language Retrieval (2023)4.52
- Deep Reversible Consistency Learning For Cross-modal Retrieval (2025)7.81
- C3: Continued Pretraining With Contrastive Weak Supervision For Cross Language Ad-hoc Retrieval (2022)8.35
- CL2CM: Improving Cross-lingual Cross-modal Retrieval Via Cross-lingual Knowledge Transfer (2023)8.60
- Generalized Contrastive Learning For Multi-modal Retrieval And Ranking (2024)6.01
- A Comprehensive Empirical Study Of Vision-language Pre-trained Model For Supervised Cross-modal Retrieval (2022)0.00
- Normalized Contrastive Learning For Text-video Retrieval (2022)6.77