Diff-sbsr: Learning Multimodal Feature-enhanced Diffusion Models For Zero-shot Sketch-based 3D Shape Retrieval
2026 Β· Hang Cheng, Fanhe Dong, Long Zeng
Abstract
This paper presents the first exploration of text-to-image diffusion models for zero-shot sketch-based 3D shape retrieval (ZS-SBSR). Existing sketch-based 3D shape retrieval methods struggle in zero-shot settings due to the absence of category supervision and the extreme sparsity of sketch inputs. Our key insight is that large-scale pretrained diffusion models inherently exhibit open-vocabulary capability and strong shape bias, making them well suited for zero-shot visual retrieval. We leverage a frozen Stable Diffusion backbone to extract and aggregate discriminative representations from intermediate U-Net layers for both sketches and rendered 3D views. Diffusion models struggle with sketches due to their extreme abstraction and sparsity, compounded by a significant domain gap from natural images. To address this limitation without costly retraining, we introduce a multimodal feature-enhanced strategy that conditions the frozen diffusion backbone with complementary visual and textual
Authors
(none)
Tags
Stats
Related papers
- Text-to-image Diffusion Models Are Great Sketch-photo Matchmakers (2024)9.41
- Domain-smoothing Network For Zero-shot Sketch-based Image Retrieval (2021)13.92
- Stacked Semantic-guided Network For Zero-shot Sketch-based Image Retrieval (2019)0.00
- An Efficient Framework For Zero-shot Sketch-based Image Retrieval (2021)13.65
- Doodle To Search: Practical Zero-shot Sketch-based Image Retrieval (2019)16.75
- Semantic Adversarial Network For Zero-shot Sketch-based Image Retrieval (2019)10.74
- Adapt And Align To Improve Zero-shot Sketch-based Image Retrieval (2023)0.00
- Modality-aware Representation Learning For Zero-shot Sketch-based Image Retrieval (2024)8.60