Pointcloud-text Matching: Benchmark Datasets And A Baseline
2024 Β· Yanglin Feng, Yang Qin, Dezhong Peng, et al.
Abstract
In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching (PTM), which aims to identify the exact cross-modal instance that matches a given point-cloud query or text query. PTM has potential applications in various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there is a lack of suitable and targeted datasets for PTM in practice. To address this issue, we present a new PTM benchmark dataset, namely SceneDepict-3D2T. We observe that the data poses significant challenges due to its inherent characteristics, such as the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which render existing cross-modal matching methods ineffective for PTM. To overcome these challenges, we propose a PTM baseline, named Robust PointCloud-Text Matching method (RoMa). RoMa consists of two key modules: a Dual Attention Perception module (DAP) and a Robust Negative Contrastive Learnin
Authors
(none)
Tags
Stats
Related papers
- Parts2words: Learning Joint Embedding Of Point Clouds And Texts By Bidirectional Matching Between Parts And Words (2021)9.96
- Pmpguard: Catching Pseudo-matched Pairs In Remote Sensing Image-text Retrieval (2025)0.00
- Stacmr: Scene-text Aware Cross-modal Retrieval (2020)10.48
- TIPCB: A Simple But Effective Part-based Convolutional Baseline For Text-based Person Search (2021)20.24
- Scene Text Retrieval Via Joint Text Detection And Similarity Learning (2021)16.16
- Text-based Person Search With Limited Data (2021)15.69
- Enhancing Image-text Matching With Adaptive Feature Aggregation (2024)6.34
- Hpointloc: Point-based Indoor Place Recognition Using Synthetic RGB-D Images (2022)10.66