End-to-end Learning Of Deep Visual Representations For Image Retrieval
2016 Β· Albert Gordo, Jon Almazan, Jerome Revaud, et al.
Abstract
While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: i) noisy training data, ii) inappropriate deep architecture, and iii) suboptimal training procedure. We address all three issues. First, we leverage a large-scale but noisy landmark dataset and develop an automatic cleaning method that produces a suitable training set for deep retrieval. Second, we build on the recent R-MAC descriptor, show that it can be interpreted as a deep and differentiable architecture, and present improvements to enhance it. Last, we train this network with a siamese architecture that combines three streams with a triplet loss. At the end of the training process, the proposed architecture produces a global image representation in a single forward pass
Authors
(none)
Tags
Stats
Related papers
- Deep Image Retrieval: Learning Global Representations For Image Search (2016)19.67
- DALG: Deep Attentive Local And Global Modeling For Image Retrieval (2022)0.00
- Deep Learning For Instance Retrieval: A Survey (2021)16.05
- Why-so-deep: Towards Boosting Previously Trained Models For Visual Place Recognition (2022)7.81
- Learning Super-features For Image Retrieval (2022)4.31
- Instance Image Retrieval By Learning Purely From Within The Dataset (2022)0.00
- Unifying Deep Local And Global Features For Image Search (2020)28.10
- A Benchmark On Tricks For Large-scale Image Retrieval (2019)0.00