INTENT: Invariance And Discrimination-aware Noise Mitigation For Robust Composed Image Retrieval
2026 Β· Zhiwei Chen, Yupeng Hu, Zhiheng Fu, et al.
Abstract
Composed Image Retrieval (CIR) is a challenging image retrieval paradigm that enables to retrieve target images based on multimodal queries consisting of reference images and modification texts. Although substantial progress has been made in recent years, existing methods assume that all samples are correctly matched. However, in real-world scenarios, due to high triplet annotation costs, CIR datasets inevitably contain annotation errors, resulting in incorrectly matched triplets. To address this issue, the problem of Noisy Triplet Correspondence (NTC) has attracted growing attention. We argue that noise in CIR can be categorized into two types: cross-modal correspondence noise and modality-inherent noise. The former arises from mismatches across modalities, whereas the latter originates from intra-modal background interference or visual factors irrelevant to the coarse-grained modification annotations. However, modality-inherent noise is often overlooked, and research on cross-modal c
Authors
(none)
Tags
Stats
Related papers
- Conesep: Cone-based Robust Noise-unlearning Compositional Network For Composed Image Retrieval (2026)0.00
- NCL-CIR: Noise-aware Contrastive Learning For Composed Image Retrieval (2025)2.26
- HABIT: Chrono-synergia Robust Progressive Learning Framework For Composed Image Retrieval (2026)2.35
- HINT: Composed Image Retrieval With Dual-path Compositional Contextualized Network (2026)0.78
- Air-know: Arbiter-calibrated Knowledge-internalizing Robust Network For Composed Image Retrieval (2026)0.00
- Heterogeneous Uncertainty-guided Composed Image Retrieval With Fine-grained Probabilistic Learning (2026)0.00
- MELT: Improve Composed Image Retrieval Via The Modification Frequentation-rarity Balance Network (2026)0.00
- CSMCIR: Cot-enhanced Symmetric Alignment With Memory Bank For Composed Image Retrieval (2026)0.00