ACTNET: End-to-end Learning Of Feature Activations And Multi-stream Aggregation For Effective Instance Image Retrieval
2019 Β· Syed Sameed Husain, Eng-Jon Ong, Miroslaw Bober
Abstract
We propose a novel CNN architecture called ACTNET for robust instance image retrieval from large-scale datasets. Our key innovation is a learnable activation layer designed to improve the signal-to-noise ratio (SNR) of deep convolutional feature maps. Further, we introduce a controlled multi-stream aggregation, where complementary deep features from different convolutional layers are optimally transformed and balanced using our novel activation layers, before aggregation into a global descriptor. Importantly, the learnable parameters of our activation blocks are explicitly trained, together with the CNN parameters, in an end-to-end manner minimising triplet loss. This means that our network jointly learns the CNN filters and their optimal activation and aggregation for retrieval tasks. To our knowledge, this is the first time parametric functions have been used to control and learn optimal aggregation. We conduct an in-depth experimental study on three non-linear activation functions:
Authors
(none)
Tags
Stats
Related papers
- Image Retrieval Using Multi-scale CNN Features Pooling (2020)9.23
- What Is The Best Practice For Cnns Applied To Visual Instance Retrieval? (2016)0.00
- Adversarial Soft-detection-based Aggregation Network For Image Retrieval (2018)0.00
- Class-weighted Convolutional Features For Visual Instance Search (2017)12.81
- Deep Image Retrieval: Learning Global Representations For Image Search (2016)19.67
- Densernet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation (2020)15.62
- Local Features And Visual Words Emerge In Activations (2019)13.55
- End-to-end Learning Of Deep Visual Representations For Image Retrieval (2016)19.66