Retrieval-enhanced Visual Prompt Learning For Few-shot Classification
2023 Β· Jintao Rong, Hao Chen, Linlin Ou, et al.
Abstract
The Contrastive Language-Image Pretraining (CLIP) model has been widely used in various downstream vision tasks. The few-shot learning paradigm has been widely adopted to augment its capacity for these tasks. However, current paradigms may struggle with fine-grained classification, such as satellite image recognition, due to widening domain gaps. To address this limitation, we propose retrieval-enhanced visual prompt learning (RePrompt), which introduces retrieval mechanisms to cache and reuse the knowledge of downstream tasks. RePrompt constructs a retrieval database from either training examples or external data if available, and uses a retrieval mechanism to enhance multiple stages of a simple prompt learning baseline, thus narrowing the domain gap. During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions. A detailed analysis reveals that retrieval helps to improve the distribution of late features, thus, improving gen
Authors
(none)
Tags
Stats
Related papers
- Elevating All Zero-shot Sketch-based Image Retrieval Through Multimodal Prompt Learning (2024)6.34
- Dual Prompt Learning For Adapting Vision-language Models To Downstream Image-text Retrieval (2025)0.00
- Pitl: Cross-modal Retrieval With Weakly-supervised Vision-language Pre-training Via Prompting (2023)7.16
- Cross-modal Retrieval Meets Inference:improving Zero-shot Classification With Cross-modal Retrieval (2023)0.00
- Prompt Switch: Efficient CLIP Adaptation For Text-video Retrieval (2023)11.93
- RECLIP: Resource-efficient CLIP By Training With Small Images (2023)0.00
- Priorclip: Visual Prior Guided Vision-language Model For Remote Sensing Image-text Retrieval (2024)0.00
- Learnable Prompt For Few-shot Semantic Segmentation In Remote Sensing Domain (2024)7.16