Tsvc:tripartite Learning With Semantic Variation Consistency For Robust Image-text Retrieval
2025 Β· Shuai Lyu, Zijing Tian, Zhonghong Ou, et al.
Abstract
Cross-modal retrieval maps data under different modality via semantic relevance. Existing approaches implicitly assume that data pairs are well-aligned and ignore the widely existing annotation noise, i.e., noisy correspondence (NC). Consequently, it inevitably causes performance degradation. Despite attempts that employ the co-teaching paradigm with identical architectures to provide distinct data perspectives, the differences between these architectures are primarily stemmed from random initialization. Thus, the model becomes increasingly homogeneous along with the training process. Consequently, the additional information brought by this paradigm is severely limited. In order to resolve this problem, we introduce a Tripartite learning with Semantic Variation Consistency (TSVC) for robust image-text retrieval. We design a tripartite cooperative learning mechanism comprising a Coordinator, a Master, and an Assistant model. The Coordinator distributes data, and the Assistant model supp
Authors
(none)
Tags
Stats
Related papers
- Preserving Semantic Neighborhoods For Robust Cross-modal Retrieval (2020)10.07
- Discriminative Semantic Transitive Consistency For Cross-modal Learning (2021)0.00
- Robust Remote Sensing Image-text Retrieval With Noisy Correspondence (2026)1.24
- COTS: Collaborative Two-stream Vision-language Pre-training Model For Cross-modal Retrieval (2022)13.60
- Cross-modal Coherence For Text-to-image Retrieval (2021)6.77
- MCAD: Multi-teacher Cross-modal Alignment Distillation For Efficient Image-text Retrieval (2023)3.58
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- Maximal Matching Matters: Preventing Representation Collapse For Robust Cross-modal Retrieval (2025)2.26