Finecir: Explicit Parsing Of Fine-grained Modification Semantics For Composed Image Retrieval
2025 · Zixu Li, Zhiheng Fu, Yupeng Hu, et al.
Abstract
Composed Image Retrieval (CIR) facilitates image retrieval through a multimodal query consisting of a reference image and modification text. The reference image defines the retrieval context, while the modification text specifies desired alterations. However, existing CIR datasets predominantly employ coarse-grained modification text (CoarseMT), which inadequately captures fine-grained retrieval intents. This limitation introduces two key challenges: (1) ignoring detailed differences leads to imprecise positive samples, and (2) greater ambiguity arises when retrieving visually similar images. These issues degrade retrieval accuracy, necessitating manual result filtering or repeated queries. To address these limitations, we develop a robust fine-grained CIR data annotation pipeline that minimizes imprecise positive samples and enhances CIR systems' ability to discern modification intents accurately. Using this pipeline, we refine the FashionIQ and CIRR datasets to create two fine-graine
Authors
(none)
Tags
Stats
Related papers
- FIRE-CIR: Fine-grained Reasoning For Composed Fashion Image Retrieval (2026)0.00
- A Sanity Check On Composed Image Retrieval (2026)0.00
- Facap: A Large-scale Fashion Dataset For Fine-grained Composed Image Retrieval (2025)0.00
- Good4cir: Generating Detailed Synthetic Captions For Composed Image Retrieval (2025)0.00
- MELT: Improve Composed Image Retrieval Via The Modification Frequentation-rarity Balance Network (2026)0.00
- TMCIR: Token Merge Benefits Composed Image Retrieval (2025)0.00
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Scale Up Composed Image Retrieval Learning Via Modification Text Generation (2025)3.58