COM3D: Leveraging Cross-view Correspondence And Cross-modal Mining For 3D Retrieval
2024 Β· Hao Wu, Ruochong Li, Hao Wang, et al.
Abstract
In this paper, we investigate an open research task of cross-modal retrieval between 3D shapes and textual descriptions. Previous approaches mainly rely on point cloud encoders for feature extraction, which may ignore key inherent features of 3D shapes, including depth, spatial hierarchy, geometric continuity, etc. To address this issue, we propose COM3D, making the first attempt to exploit the cross-view correspondence and cross-modal mining to enhance the retrieval performance. Notably, we augment the 3D features through a scene representation transformer, to generate cross-view correspondence features of 3D shapes, which enrich the inherent features and enhance their compatibility with text matching. Furthermore, we propose to optimize the cross-modal matching process based on the semi-hard negative example mining method, in an attempt to improve the learning efficiency. Extensive quantitative and qualitative experiments demonstrate the superiority of our proposed COM3D, achieving s
Authors
(none)
Tags
Stats
Related papers
- Enhanced Cross-modal 3D Retrieval Via Tri-modal Reconstruction (2025)0.00
- SCA3D: Enhancing Cross-modal 3D Retrieval Via 3D Shape And Caption Paired Data Augmentation (2025)4.17
- Crossover: 3D Scene Cross-modal Alignment (2025)4.52
- Sca-pvnet: Self-and-cross Attention Based Aggregation Of Point Cloud And Multi-view For 3D Object Retrieval (2023)10.07
- SAMURAI: Shape-aware Multimodal Retrieval For 3D Object Identification (2025)0.00
- Generating Holistic 3D Scene Abstractions For Text-based Image Retrieval (2016)9.03
- Contrastive Masked Auto-encoders Based Self-supervised Hashing For 2D Image And 3D Point Cloud Cross-modal Retrieval (2024)2.26
- Describe, Adapt And Combine: Empowering CLIP Encoders For Open-set 3D Object Retrieval (2025)2.51