Learning To Rematch Mismatched Pairs For Robust Cross-modal Retrieval
2024 Β· Haochen Han, Qinghua Zheng, Guang Dai, et al.
Abstract
Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this problem by estimating a soft correspondence to down-weight the contribution of PMPs. In this paper, we aim to address this challenge from a new perspective: the potential semantic similarity among unpaired samples makes it possible to excavate useful knowledge from mismatched pairs. To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to generate refined alignments by seeking a minimal-cost transport plan across different modalities. To formalize the rematching idea in OT, first, we propose a self-supervis
Authors
(none)
Tags
Stats
Related papers
- Maximal Matching Matters: Preventing Representation Collapse For Robust Cross-modal Retrieval (2025)2.26
- A Unified Optimal Transport Framework For Cross-modal Retrieval With Noisy Labels (2024)5.24
- Pmpguard: Catching Pseudo-matched Pairs In Remote Sensing Image-text Retrieval (2025)0.00
- Rematch: Boosting Representation Through Matching For Multimodal Retrieval (2025)0.00
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60
- Swamp: Swapped Assignment Of Multi-modal Pairs For Cross-modal Retrieval (2021)0.00
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Deep Reversible Consistency Learning For Cross-modal Retrieval (2025)7.81