Viewformer: View Set Attention For Multi-view 3D Shape Understanding
2023 Β· Hongyu Sun, Yongcai Wang, Peng Wang, et al.
Abstract
This paper presents ViewFormer, a simple yet effective model for multi-view 3d shape recognition and retrieval. We systematically investigate the existing methods for aggregating multi-view information and propose a novel ``view set" perspective, which minimizes the relation assumption about the views and releases the representation flexibility. We devise an adaptive attention model to capture pairwise and higher-order correlations of the elements in the view set. The learned multi-view correlations are aggregated into an expressive view set descriptor for recognition and retrieval. Experiments show the proposed method unleashes surprising capabilities across different tasks and datasets. For instance, with only 2 attention blocks and 4.8M learnable parameters, ViewFormer reaches 98.8% recognition accuracy on ModelNet40 for the first time, exceeding previous best method by 1.1% . On the challenging RGBD dataset, our method achieves 98.4% recognition accuracy, which is a 4.1% absolute i
Authors
(none)
Tags
Stats
Related papers
- PREMA: Part-based Recurrent Multi-view Aggregation Network For 3D Shape Retrieval (2021)3.58
- View N-gram Network For 3D Object Retrieval (2019)13.05
- MVTN: Multi-view Transformation Network For 3D Shape Recognition (2020)21.44
- Latformer: Locality-aware Point-view Fusion Transformer For 3D Shape Recognition (2021)6.34
- Pvrnet: Point-view Relation Neural Network For 3D Shape Recognition (2018)13.11
- Sca-pvnet: Self-and-cross Attention Based Aggregation Of Point Cloud And Multi-view For 3D Object Retrieval (2023)10.07
- Generalized Multi-view Embedding For Visual Recognition And Cross-modal Retrieval (2016)14.69
- Fusionbert: Multi-view Image-3d Retrieval Via Cross-attention Visual Fusion And Normal-aware 3D Encoder (2026)0.00