Image2sentence Based Asymmetrical Zero-shot Composed Image Retrieval
2024 Β· Yongchao Du, Min Wang, Wengang Zhou, et al.
Abstract
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent. Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments when deploying the large vision-language model. To tackle the above problems, we propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning. In the framework, we propose a new adaptive token learner that maps an image to a sentence in the word embedding space of VL model. The sentence adaptively captures discriminative visual information and is further integrated with the text modifier. An asymmetric structure is devised for flexible deployment, in which the lightweight
Authors
(none)
Tags
Stats
Related papers
- From Mapping To Composing: A Two-stage Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Fine-grained Zero-shot Composed Image Retrieval With Complementary Visual-semantic Integration (2026)1.24
- Modality And Task Adaptation For Enhanced Zero-shot Composed Image Retrieval (2024)0.00
- Isearle: Improving Textual Inversion For Zero-shot Composed Image Retrieval (2024)12.09
- Compositional Image Retrieval Via Instruction-aware Contrastive Learning (2024)0.00
- Zero-shot Composed Image Retrieval With Textual Inversion (2023)19.84
- Zero-shot Composed Text-image Retrieval (2023)0.00
- Scaling Prompt Instructed Zero Shot Composed Image Retrieval With Image-only Data (2025)0.00