PDV: Prompt Directional Vectors For Zero-shot Composed Image Retrieval
2025 Β· Osman Tursun, Sinan Kalkan, Simon Denman, et al.
Abstract
Zero-shot Composed Image Retrieval (ZS-CIR) enables image search using a reference image and a text prompt without requiring specialized text-image composition networks trained on large-scale paired data. However, current ZS-CIR approaches suffer from three critical limitations in their reliance on composed text embeddings: static query embedding representations, insufficient utilization of image embeddings, and suboptimal performance when fusing text and image embeddings. To address these challenges, we introduce the \textbf\{Prompt Directional Vector (PDV)\}, a simple yet effective training-free enhancement that captures semantic modifications induced by user prompts. PDV enables three key improvements: (1) Dynamic composed text embeddings where prompt adjustments are controllable via a scaling factor, (2) composed image embeddings through semantic transfer from text prompts to image features, and (3) weighted fusion of composed text and image embeddings that enhances retrieval by ba
Authors
(none)
Tags
Stats
Related papers
- Fine-grained Zero-shot Composed Image Retrieval With Complementary Visual-semantic Integration (2026)1.24
- From Mapping To Composing: A Two-stage Framework For Zero-shot Composed Image Retrieval (2025)0.00
- Data-efficient Generalization For Zero-shot Composed Image Retrieval (2025)2.26
- Zero Shot Composed Image Retrieval (2025)1.57
- Multimodal Reasoning Agent For Zero-shot Composed Image Retrieval (2025)0.00
- Sentence-level Prompts Benefit Composed Image Retrieval (2023)3.95
- Generative Editing In The Joint Vision-language Space For Zero-shot Composed Image Retrieval (2025)0.00
- Training-free Zero-shot Composed Image Retrieval With Local Concept Reranking (2023)0.00