CSMCIR: Cot-enhanced Symmetric Alignment With Memory Bank For Composed Image Retrieval
2026 Β· Zhipeng Qian, Zihan Liang, Yufei Ma, et al.
Abstract
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text, offering substantial advantages over single-modality retrieval systems. However, existing CIR methods suffer from representation space fragmentation: queries and targets comprise heterogeneous modalities and are processed by distinct encoders, forcing models to bridge misaligned representation spaces only through post-hoc alignment, which fundamentally limits retrieval performance. This architectural asymmetry manifests as three distinct, well-separated clusters in the feature space, directly demonstrating how heterogeneous modalities create fundamentally misaligned representation spaces from initialization. In this work, we propose CSMCIR, a unified representation framework that achieves efficient query-target alignment through three synergistic components. First, we introduce a Multi-level Chain-of-Thought (MCoT) prompting strategy that guides Multimodal Large
Authors
(none)
Tags
Stats
Related papers
- TMCIR: Token Merge Benefits Composed Image Retrieval (2025)0.00
- HINT: Composed Image Retrieval With Dual-path Compositional Contextualized Network (2026)0.78
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- Mcot-mvs: Multi-level Vision Selection By Multi-modal Chain-of-thought Reasoning For Composed Image Retrieval (2026)0.00
- Cir-cot: Towards Interpretable Composed Image Retrieval Via End-to-end Chain-of-thought Reasoning (2025)0.00
- Cala: Complementary Association Learning For Augmenting Composed Image Retrieval (2024)9.41
- From Mapping To Composing: A Two-stage Framework For Zero-shot Composed Image Retrieval (2025)0.00