Collaborative Group: Composed Image Retrieval Via Consensus Learning From Noisy Annotations
2023 Β· Xu Zhang, Zhedong Zheng, Linchao Zhu, et al.
Abstract
Composed image retrieval extends content-based image retrieval systems by enabling users to search using reference images and captions that describe their intention. Despite great progress in developing image-text compositors to extract discriminative visual-linguistic features, we identify a hitherto overlooked issue, triplet ambiguity, which impedes robust feature extraction. Triplet ambiguity refers to a type of semantic ambiguity that arises between the reference image, the relative caption, and the target image. It is mainly due to the limited representation of the annotated text, resulting in many noisy triplets where multiple visually dissimilar candidate images can be matched to an identical reference pair (i.e., a reference image + a relative caption). To address this challenge, we propose the Consensus Network (Css-Net), inspired by the psychological concept that groups outperform individuals. Css-Net comprises two core components: (1) a consensus module with four diverse com
Authors
(none)
Tags
Stats
Related papers
- Conesep: Cone-based Robust Noise-unlearning Compositional Network For Composed Image Retrieval (2026)0.00
- HABIT: Chrono-synergia Robust Progressive Learning Framework For Composed Image Retrieval (2026)2.35
- Composed Image Retrieval Using Contrastive Learning And Task-oriented Clip-based Features (2023)16.84
- Pseudo-triplet Guided Few-shot Composed Image Retrieval (2024)0.00
- Air-know: Arbiter-calibrated Knowledge-internalizing Robust Network For Composed Image Retrieval (2026)0.00
- Cala: Complementary Association Learning For Augmenting Composed Image Retrieval (2024)9.41
- INTENT: Invariance And Discrimination-aware Noise Mitigation For Robust Composed Image Retrieval (2026)0.00
- Improving Composed Image Retrieval Via Contrastive Learning With Scaling Positives And Negatives (2024)11.30