Knowledge-enhanced Dual-stream Zero-shot Composed Image Retrieval
2024 Β· Yucheng Suo, Fan Ma, Linchao Zhu, et al.
Abstract
We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. However, they focus on the global visual representation, ignoring the representation of detailed attributes, e.g., color, object number and layout. To address this challenge, we propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs). KEDs implicitly models the attributes of the reference images by incorporating a database. The database enriches the pseudo-word tokens by providing relevant images and captions, emphasizing shared attribute information in various aspects. In this way, KEDs recognizes the reference image from diverse perspectives. Moreover, KEDs adopts an extra stream that aligns pseudo-word tokens with textual concepts, leveraging pseudo-tr
Authors
(none)
Tags
Stats
Related papers
- SETR: A Two-stage Semantic-enhanced Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Fine-grained Zero-shot Composed Image Retrieval With Complementary Visual-semantic Integration (2026)1.24
- Data-efficient Generalization For Zero-shot Composed Image Retrieval (2025)2.26
- Training-free Zero-shot Composed Image Retrieval With Local Concept Reranking (2023)0.00
- From Mapping To Composing: A Two-stage Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Multimodal Reasoning Agent For Zero-shot Composed Image Retrieval (2025)0.00
- Zero-shot Composed Image Retrieval With Textual Inversion (2023)19.84
- WISER: Wider Search, Deeper Thinking, And Adaptive Fusion For Training-free Zero-shot Composed Image Retrieval (2026)2.98