Data Roaming And Quality Assessment For Composed Image Retrieval
2023 Β· Matan Levy, Rami Ben-Ari, Nir Darshan, et al.
Abstract
The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other vision and language (V&L) datasets. Additionally, some of these datasets have noticeable issues, such as queries containing redundant modalities. To address these shortcomings, we introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones. Pre-training on our LaSCo, shows a noteworthy improvement in performance, even in zero-shot. Furthermore, we propose a new approach for analyzing CoIR datasets and methods, which detects modality redundancy or necessity, in queries. We also introduce a new CoIR baseline, the Cross-Attention driven Shift Encoder (CASE). This baseline allows for early fusion of modalities using a cross-attention module and employs an additional auxiliary task dur
Authors
(none)
Tags
Stats
Related papers
- Instance-level Composed Image Retrieval (2025)0.00
- Composed Image Retrieval For Remote Sensing (2024)11.03
- Context-cir: Learning From Concepts In Text For Composed Image Retrieval (2025)4.67
- Composed Object Retrieval: Object-level Retrieval Via Composed Expressions (2025)1.91
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- Cir-cot: Towards Interpretable Composed Image Retrieval Via End-to-end Chain-of-thought Reasoning (2025)0.00
- Scaling Prompt Instructed Zero Shot Composed Image Retrieval With Image-only Data (2025)0.00
- Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021)17.07