Fine-grained Image Retrieval Via Dual-vision Adaptation
2025 Β· Xin Jiang, Meiqi Cao, Hao Tang, et al.
Abstract
Fine-Grained Image Retrieval~(FGIR) faces challenges in learning discriminative visual representations to retrieve images with similar fine-grained features. Current leading FGIR solutions typically follow two regimes: enforce pairwise similarity constraints in the semantic embedding space, or incorporate a localization sub-network to fine-tune the entire model. However, such two regimes tend to overfit the training data while forgetting the knowledge gained from large-scale pre-training, thus reducing their generalization ability. In this paper, we propose a Dual-Vision Adaptation (DVA) approach for FGIR, which guides the frozen pre-trained model to perform FGIR through collaborative sample and feature adaptation. Specifically, we design Object-Perceptual Adaptation, which modifies input samples to help the pre-trained model perceive critical objects and elements within objects that are helpful for category prediction. Meanwhile, we propose In-Context Adaptation, which introduces a sm
Authors
(none)
Tags
Stats
Related papers
- DVF: Advancing Robust And Accurate Fine-grained Image Retrieval With Retrieval Guidelines (2024)9.03
- Adaptive Fine-grained Sketch-based Image Retrieval (2022)9.76
- Adversarial Reconstruction Feedback For Robust Fine-grained Generalization (2025)0.00
- Unifgvc: Universal Training-free Few-shot Fine-grained Vision Classification Via Attribute-aware Multimodal Retrieval (2025)0.00
- Language-driven Fine-grained Retrieval (2025)0.00
- One-shot Fine-grained Instance Retrieval (2017)10.35
- Arnet: Self-supervised FG-SBIR With Unified Sample Feature Alignment And Multi-scale Token Recycling (2024)5.84
- Coarse-to-fine: Learning Compact Discriminative Representation For Single-stage Image Retrieval (2023)9.35