Sketch And Text Synergy: Fusing Structural Contours And Descriptive Attributes For Fine-grained Image Retrieval
2026 Β· Siyuan Wang, Hanchen Gao, Guangming Zhu, et al.
Abstract
Fine-grained image retrieval via hand-drawn sketches or textual descriptions remains a critical challenge due to inherent modality gaps. While hand-drawn sketches capture complex structural contours, they lack color and texture, which text effectively provides despite omitting spatial contours. Motivated by the complementary nature of these modalities, we propose the Sketch and Text Based Image Retrieval (STBIR) framework. By synergizing the rich color and texture cues from text with the structural outlines provided by sketches, STBIR achieves superior fine-grained retrieval performance. First, a curriculum learning driven robustness enhancement module is proposed to enhance the model's robustness when handling queries of varying quality. Second, we introduce a category-knowledge-based feature space optimization module, thereby significantly boosting the model's representational power. Finally, we design a multi-stage cross-modal feature alignment mechanism to effectively mitigate the
Authors
(none)
Tags
Stats
Related papers
- You'll Never Walk Alone: A Sketch And Text Duet For Fine-grained Image Retrieval (2024)9.41
- A Sketch Is Worth A Thousand Words: Image Retrieval With Text And Sketch (2022)10.35
- Sketch Less For More: On-the-fly Fine-grained Sketch Based Image Retrieval (2020)15.28
- Cross-modal Hierarchical Modelling For Fine-grained Sketch Based Image Retrieval (2020)6.77
- Cross-modal Subspace Learning For Fine-grained Sketch-based Image Retrieval (2017)13.34
- Back To The Drawing Board: Rethinking Scene-level Sketch-based Image Retrieval (2025)0.00
- Composite Sketch+text Queries For Retrieving Objects With Elusive Names And Complex Interactions (2025)5.84
- Dual-modal Prompting For Sketch-based Image Retrieval (2024)0.00