Improving Composed Image Retrieval Via Contrastive Learning With Scaling Positives And Negatives
2024 Β· Zhangchi Feng, Richong Zhang, Zhijie Nie
Abstract
The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, e
Authors
(none)
Tags
Stats
Related papers
- Scale Up Composed Image Retrieval Learning Via Modification Text Generation (2025)3.58
- Scaling Prompt Instructed Zero Shot Composed Image Retrieval With Image-only Data (2025)0.00
- NCL-CIR: Noise-aware Contrastive Learning For Composed Image Retrieval (2025)2.26
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Triplet Synthesis For Enhancing Composed Image Retrieval Via Counterfactual Image Generation (2025)3.58
- Qure: Query-relevant Retrieval Through Hard Negative Sampling In Composed Image Retrieval (2025)2.35
- Automatic Synthesis Of High-quality Triplet Data For Composed Image Retrieval (2025)0.00
- Pseudo-triplet Guided Few-shot Composed Image Retrieval (2024)0.00