CODER: Coupled Diversity-sensitive Momentum Contrastive Learning For Image-text Retrieval
2022 Β· Haoran Wang, Dongliang He, Wenhao Wu, et al.
Abstract
Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities. Contrastive learning has been adopted by most prior arts. Except for limited amount of negative image-text pairs, the capability of constrastive learning is restricted by manually weighting negative pairs as well as unawareness of external knowledge. In this paper, we propose our novel Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation. Firstly, a novel diversity-sensitive contrastive learning (DCL) architecture is invented. We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting. Furthermore, two branches are designed in CODER. One learns instance-level embeddings from image/text, and it also generates pseudo online clustering labels for its input image/text based on their embeddings. Meanwhile, the other branch learns to query fro
Authors
(none)
Tags
Stats
Related papers
- Dynamic Contrastive Distillation For Image-text Retrieval (2022)11.76
- Dual-modal Attention-enhanced Text-video Retrieval With Triplet Partial Margin Contrastive Learning (2023)8.82
- Loopitr: Combining Dual And Cross Encoder Architectures For Image-text Retrieval (2022)0.00
- Intra-modal Constraint Loss For Image-text Retrieval (2022)8.33
- ITO: Images And Texts As One Via Synergizing Multiple Alignment And Training-time Fusion (2026)0.00
- Generalized Contrastive Learning For Universal Multimodal Retrieval (2025)0.00
- Normalized Contrastive Learning For Text-video Retrieval (2022)6.77
- Keyword-based Diverse Image Retrieval By Semantics-aware Contrastive Learning And Transformer (2023)5.24