DALG: Deep Attentive Local And Global Modeling For Image Retrieval
2022 Β· Yuxin Song, Ruolin Zhu, Min Yang, et al.
Abstract
Deeply learned representations have achieved superior image retrieval performance in a retrieve-then-rerank manner. Recent state-of-the-art single stage model, which heuristically fuses local and global features, achieves promising trade-off between efficiency and effectiveness. However, we notice that efficiency of existing solutions is still restricted because of their multi-scale inference paradigm. In this paper, we follow the single stage art and obtain further complexity-effectiveness balance by successfully getting rid of multi-scale testing. To achieve this goal, we abandon the widely-used convolution network giving its limitation in exploring diverse visual patterns, and resort to fully attention based framework for robust representation learning motivated by the success of Transformer. Besides applying Transformer for global feature extraction, we devise a local branch composed of window-based multi-head attention and spatial attention to fully exploit local image patterns. F
Authors
(none)
Tags
Stats
Related papers
- Unifying Deep Local And Global Features For Image Search (2020)28.10
- Deep Image Retrieval: Learning Global Representations For Image Search (2016)19.67
- DOLG: Single-stage Image Retrieval With Deep Orthogonal Fusion Of Local And Global Features (2021)15.95
- All The Attention You Need: Global-local, Spatial-channel Attention For Image Retrieval (2021)13.97
- Learning Super-features For Image Retrieval (2022)4.31
- End-to-end Learning Of Deep Visual Representations For Image Retrieval (2016)19.66
- Global-to-local Or Local-to-global? Enhancing Image Retrieval With Efficient Local Search And Effective Global Re-ranking (2025)0.00
- Large-scale Image Retrieval With Attentive Deep Local Features (2016)30.63