A Sanity Check On Composed Image Retrieval
2026 Β· Yikun Liu, Jiangchao Yao, Weidi Xie, et al.
Abstract
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is not well characterized by existing benchmarks, which inherently contain indeterminate queries degrading the evaluation (i.e., multiple candidate images, rather than solely the target image, meet the query criteria), and have not considered their effectiveness in the context of the multi-round system. Motivated by this, we consider improving the evaluation procedure from two aspects: 1) we introduce FISD, a Fully-Informed Semantically-Diverse benchmark, which employs generative models to precisely control the variables of reference-target image pairs, enabling a more accurate evaluation of CIR methods across six dimensions, without query ambiguity; 2) we propose an automatic multi-round agentic evaluation framework to probe the potential of the exist
Authors
(none)
Tags
Stats
Related papers
- Instance-level Composed Image Retrieval (2025)0.00
- Rethinking Composed Image Retrieval Evaluation: A Fine-grained Benchmark From Image Editing (2026)0.00
- HINT: Composed Image Retrieval With Dual-path Compositional Contextualized Network (2026)0.78
- Good4cir: Generating Detailed Synthetic Captions For Composed Image Retrieval (2025)0.00
- Beyond Semantic Search: Towards Referential Anchoring In Composed Image Retrieval (2026)0.00
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00