Swamp: Swapped Assignment Of Multi-modal Pairs For Cross-modal Retrieval
2021 Β· Minyoung Kim
Abstract
We tackle the cross-modal retrieval problem, where learning is only supervised by relevant multi-modal pairs in the data. Although the contrastive learning is the most popular approach for this task, it makes potentially wrong assumption that the instances in different pairs are automatically irrelevant. To address the issue, we propose a novel loss function that is based on self-labeling of the unknown semantic classes. Specifically, we aim to predict class labels of the data instances in each modality, and assign those labels to the corresponding instances in the other modality (i.e., swapping the pseudo labels). With these swapped labels, we learn the data embedding for each modality using the supervised cross-entropy loss. This way, cross-modal instances from different pairs that are semantically related can be aligned to each other by the class predictor. We tested our approach on several real-world cross-modal retrieval problems, including text-based video retrieval, sketch-based
Authors
(none)
Tags
Stats
Related papers
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60
- Semi-supervised Cross-modal Retrieval With Label Prediction (2018)11.29
- Discriminative Semantic Transitive Consistency For Cross-modal Learning (2021)0.00
- Learning To Rematch Mismatched Pairs For Robust Cross-modal Retrieval (2024)13.82
- Maximal Matching Matters: Preventing Representation Collapse For Robust Cross-modal Retrieval (2025)2.26
- Label Prediction Framework For Semi-supervised Cross-modal Retrieval (2019)5.24
- Cross-modal Coordination Across A Diverse Set Of Input Modalities (2024)0.00
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07