Instance-level Composed Image Retrieval
2025 Β· Bill Psomas, George Retsinas, Nikos Efthymiadis, et al.
Abstract
The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instance-level class definition. The goal is to retrieve images that contain the same particular object as the visual query, presented under a variety of modifications defined by textual queries. Its design and curation process keep the dataset compact to facilitate future research, while maintaining its challenge-comparable to retrieval among more than 40M random distractors-through a semi-automated selection of hard negatives. To overcome the challenge of obtaining clean, diverse, and suitable training data, we leverage pre-trained vision-and-language models (VLMs) in a training-free approach called BASIC. The method separately estimates query-image-to-image and query-text
Authors
(none)
Tags
Stats
Related papers
- A Sanity Check On Composed Image Retrieval (2026)0.00
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Compositional Image Retrieval Via Instruction-aware Contrastive Learning (2024)0.00
- Scaling Prompt Instructed Zero Shot Composed Image Retrieval With Image-only Data (2025)0.00
- Zero-shot Composed Text-image Retrieval (2023)0.00
- Sentence-level Prompts Benefit Composed Image Retrieval (2023)3.95
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021)17.07