Dual-view Curricular Optimal Transport For Cross-lingual Cross-modal Retrieval
2023 Β· Yabing Wang, Shuhui Wang, Hao Luo, et al.
Abstract
Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number of English-oriented human-labeled vision-language corpora. In order to break the limit of non-English labeled data, cross-lingual cross-modal retrieval (CCR) has attracted increasing attention. Most CCR methods construct pseudo-parallel vision-language corpora via Machine Translation (MT) to achieve cross-lingual transfer. However, the translated sentences from MT are generally imperfect in describing the corresponding visual contents. Improperly assuming the pseudo-parallel data are correctly correlated will make the networks overfit to the noisy correspondence. Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR. In particular, we quantify the confidence of the sample pair correlation with optimal transport theory from both the cross-lingual and cross-modal views, and design dual-view curriculum learning to dynamically m
Authors
(none)
Tags
Stats
Related papers
- A Unified Optimal Transport Framework For Cross-modal Retrieval With Noisy Labels (2024)5.24
- CL2CM: Improving Cross-lingual Cross-modal Retrieval Via Cross-lingual Knowledge Transfer (2023)8.60
- Improving Cross-lingual Information Retrieval On Low-resource Languages Via Optimal Transport Distillation (2023)10.07
- COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022)13.60
- Improving The Consistency In Cross-lingual Cross-modal Retrieval With 1-to-k Contrastive Learning (2024)5.84
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60
- Covlr: Coordinating Cross-modal Consistency And Intra-modal Structure For Vision-language Retrieval (2023)4.52
- Unsupervised Cross-domain Image Retrieval Via Prototypical Optimal Transport (2024)8.09