PC\(^2\): Pseudo-classification Based Pseudo-captioning For Noisy Correspondence Learning In Cross-modal Retrieval
2024 Β· Yue Duan, Zhangxuan Gu, Zhenzhe Ying, et al.
Abstract
In the realm of cross-modal retrieval, seamlessly integrating diverse modalities within multimedia remains a formidable challenge, especially given the complexities introduced by noisy correspondence learning (NCL). Such noise often stems from mismatched data pairs, which is a significant obstacle distinct from traditional noisy labels. This paper introduces Pseudo-Classification based Pseudo-Captioning (PC\(^2\)) framework to address this challenge. PC\(^2\) offers a threefold strategy: firstly, it establishes an auxiliary "pseudo-classification" task that interprets captions as categorical labels, steering the model to learn image-text semantic similarity through a non-contrastive mechanism. Secondly, unlike prevailing margin-based techniques, capitalizing on PC\(^2\)'s pseudo-classification capability, we generate pseudo-captions to provide more informative and tangible supervision for each mismatched pair. Thirdly, the oscillation of pseudo-classification is borrowed to assistant t
Authors
(none)
Tags
Stats
Related papers
- PCSR: Pseudo-label Consistency-guided Sample Refinement For Noisy Correspondence Learning (2025)0.00
- CLIPS: An Enhanced CLIP Framework For Learning With Synthetic Captions (2024)0.00
- Probabilistic Embeddings For Cross-modal Retrieval (2021)21.70
- CPCL: Cross-modal Prototypical Contrastive Learning For Weakly Supervised Text-based Person Retrieval (2024)0.00
- Caption-matching: A Multimodal Approach For Cross-domain Image Retrieval (2024)0.00
- MCA: 2D-3D Retrieval With Noisy Labels Via Multi-level Adaptive Correction And Alignment (2025)0.00
- Estimated Audio-caption Correspondences Improve Language-based Audio Retrieval (2024)0.00
- Towards Retrieval-augmented Architectures For Image Captioning (2024)9.41