SCA3D: Enhancing Cross-modal 3D Retrieval Via 3D Shape And Caption Paired Data Augmentation
2025 Β· Junlong Ren, Hao Wu, Hui Xiong, et al.
Abstract
The cross-modal 3D retrieval task aims to achieve mutual matching between text descriptions and 3D shapes. This has the potential to enhance the interaction between natural language and the 3D environment, especially within the realms of robotics and embodied artificial intelligence (AI) applications. However, the scarcity and expensiveness of 3D data constrain the performance of existing cross-modal 3D retrieval methods. These methods heavily rely on features derived from the limited number of 3D shapes, resulting in poor generalization ability across diverse scenarios. To address this challenge, we introduce SCA3D, a novel 3D shape and caption online data augmentation method for cross-modal 3D retrieval. Our approach uses the LLaVA model to create a component library, captioning each segmented part of every 3D shape within the dataset. Notably, it facilitates the generation of extensive new 3D-text pairs containing new semantic features. We employ both inter and intra distances to al
Authors
(none)
Tags
Stats
Related papers
- COM3D: Leveraging Cross-view Correspondence And Cross-modal Mining For 3D Retrieval (2024)3.58
- Enhanced Cross-modal 3D Retrieval Via Tri-modal Reconstruction (2025)0.00
- Y^2seq2seq: Cross-modal Representation Learning For 3D Shape And Text By Joint Reconstruction And Prediction Of View And Word Sequences (2018)12.02
- Sca-pvnet: Self-and-cross Attention Based Aggregation Of Point Cloud And Multi-view For 3D Object Retrieval (2023)10.07
- SAMURAI: Shape-aware Multimodal Retrieval For 3D Object Identification (2025)0.00
- Joint Learning Of 3D Shape Retrieval And Deformation (2021)11.08
- Extending Deepsdf For Automatic 3D Shape Retrieval And Similarity Transform Estimation (2020)0.00
- 3D Shape Knowledge Graph For Cross-domain 3D Shape Retrieval (2022)5.24