Volumetric Transformer Networks
2020 · Seungryong Kim, Sabine Süsstrunk, Mathieu Salzmann
Abstract
Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN. Our experiments show that VTN consistently boosts the features' representation power and
Authors
(none)
Tags
Stats
Related papers
- Transform-invariant Convolutional Neural Networks For Image Classification And Search (2019)13.58
- Analyzing Local Representations Of Self-supervised Vision Transformers (2023)0.00
- Group Invariant Deep Representations For Image Instance Retrieval (2016)0.00
- Transvcl: Attention-enhanced Video Copy Localization Network With Flexible Supervision (2022)13.47
- Tensor Composition Net For Visual Relationship Prediction (2020)0.00
- MVTN: Multi-view Transformation Network For 3D Shape Recognition (2020)21.44
- Densernet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation (2020)15.62
- One Is All: Bridging The Gap Between Neural Radiance Fields Architectures With Progressive Volume Distillation (2022)13.74