Text-to-image Diffusion Models Are Great Sketch-photo Matchmakers
2024 Β· Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, et al.
Abstract
This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pilot studies. In order to harness pre-trained diffusion models effectively, we introduce a straightforward yet powerful strategy focused on two key aspects: selecting optimal feature layers and utilising visual and textual prompts. For the former, we identify which layers are most enriched with information and are best suited for the specific retrieval requirements (category-level or fine-grained). Then we employ visual and textual prompts to guide the model's feature extraction process, enabling it to generate more discriminative and contextually relevant cross-modal representations. Extensive
Authors
(none)
Tags
Stats
Related papers
- Diff-sbsr: Learning Multimodal Feature-enhanced Diffusion Models For Zero-shot Sketch-based 3D Shape Retrieval (2026)0.00
- Adapt And Align To Improve Zero-shot Sketch-based Image Retrieval (2023)0.00
- Zero-shot Everything Sketch-based Image Retrieval, And In Explainable Style (2023)16.67
- Doodle To Search: Practical Zero-shot Sketch-based Image Retrieval (2019)16.75
- Sketch And Text Synergy: Fusing Structural Contours And Descriptive Attributes For Fine-grained Image Retrieval (2026)0.00
- CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023)15.54
- Domain-smoothing Network For Zero-shot Sketch-based Image Retrieval (2021)13.92
- Enhancing Product Search Interfaces With Sketch-guided Diffusion And Language Agents (2025)0.00