HINT: Composed Image Retrieval With Dual-path Compositional Contextualized Network
2026 Β· Mingyu Zhang, Zixu Li, Zhiwei Chen, et al.
Abstract
Composed Image Retrieval (CIR) is a challenging image retrieval paradigm. It aims to retrieve target images from large-scale image databases that are consistent with the modification semantics, based on a multimodal query composed of a reference image and modification text. Although existing methods have made significant progress in cross-modal alignment and feature fusion, a key flaw remains: the neglect of contextual information in discriminating matching samples. However, addressing this limitation is not an easy task due to two challenges: 1) implicit dependencies and 2) the lack of a differential amplification mechanism. To address these challenges, we propose a dual-patH composItional coNtextualized neTwork (HINT), which can perform contextualized encoding and amplify the similarity differences between matching and non-matching samples, thus improving the upper performance of CIR models in complex scenarios. Our HINT model achieves optimal performance on all metrics across two CI
Authors
(none)
Tags
Stats
Related papers
- HABIT: Chrono-synergia Robust Progressive Learning Framework For Composed Image Retrieval (2026)2.35
- NCL-CIR: Noise-aware Contrastive Learning For Composed Image Retrieval (2025)2.26
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- TMCIR: Token Merge Benefits Composed Image Retrieval (2025)0.00
- CSMCIR: Cot-enhanced Symmetric Alignment With Memory Bank For Composed Image Retrieval (2026)0.00
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- A Sanity Check On Composed Image Retrieval (2026)0.00