Self-supervised Modal And View Invariant Feature Learning
2020 Β· Longlong Jing, Yucheng Chen, Ling Zhang, et al.
Abstract
Most of the existing self-supervised feature learning methods for 3D data either learn 3D features from point cloud data or from multi-view images. By exploring the inherent multi-modality attributes of 3D objects, in this paper, we propose to jointly learn modal-invariant and view-invariant features from different modalities including image, point cloud, and mesh with heterogeneous networks for 3D data. In order to learn modal- and view-invariant features, we propose two types of constraints: cross-modal invariance constraint and cross-view invariant constraint. Cross-modal invariance constraint forces the network to maximum the agreement of features from different modalities for same objects, while the cross-view invariance constraint forces the network to maximum agreement of features from different views of images for same objects. The quality of learned features has been tested on different downstream tasks with three modalities of data including point cloud, multi-view images, an
Authors
(none)
Tags
Stats
Related papers
- Generalized Multi-view Embedding For Visual Recognition And Cross-modal Retrieval (2016)14.69
- Enhanced Cross-modal 3D Retrieval Via Tri-modal Reconstruction (2025)0.00
- Multiview-consistent Semi-supervised Learning For 3D Human Pose Estimation (2019)13.05
- Multimodal Clustering Networks For Self-supervised Learning From Unlabeled Videos (2021)13.28
- Deeppoint3d: Learning Discriminative Local Descriptors Using Deep Metric Learning On 3D Point Clouds (2019)9.59
- Multiple Discrimination And Pairwise CNN For View-based 3D Object Retrieval (2020)14.27
- Instance-variant Loss With Gaussian RBF Kernel For 3D Cross-modal Retriveal (2023)0.00
- Multiview Image-based Localization (2025)0.00