Elevating All Zero-shot Sketch-based Image Retrieval Through Multimodal Prompt Learning
2024 Β· Mainak Singha, Ankit Jha, Divyam Gupta, et al.
Abstract
We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to exploit CLIP's integrated visual and textual capabilities fully. To bridge this gap, we introduce SpLIP, a novel multi-modal prompt learning scheme designed to operate effectively with frozen CLIP backbones. We diverge from existing multi-modal prompting methods that treat visual and textual prompts independently or integrate them in a limited fashion, leading to suboptimal generalization. SpLIP implements a bi-directional prompt-sharing strategy that enables mutual knowledge exchange between CLIP's visual and textual encoders, fostering a more cohesive and synergistic prompt processing mechanism that signifi
Authors
(none)
Tags
Stats
Related papers
- Dual-modal Prompting For Sketch-based Image Retrieval (2024)0.00
- CLIP For All Things Zero-shot Sketch-based Image Retrieval, Fine-grained Or Not (2023)15.54
- Modality-aware Representation Learning For Zero-shot Sketch-based Image Retrieval (2024)8.60
- Crossatnet - A Novel Cross-attention Based Framework For Sketch-based Image Retrieval (2021)11.29
- Relation-aware Meta-learning For Zero-shot Sketch-based Image Retrieval (2024)0.00
- Retrieval-enhanced Visual Prompt Learning For Few-shot Classification (2023)4.52
- Adapt And Align To Improve Zero-shot Sketch-based Image Retrieval (2023)0.00
- An Efficient Framework For Zero-shot Sketch-based Image Retrieval (2021)13.65