VXP: Voxel-cross-pixel Large-scale Image-lidar Place Recognition
2024 Β· Yun-Jin Li, Mariia Gladkova, Yan Xia, et al.
Abstract
Cross-modal place recognition methods are flexible GPS-alternatives under varying environment conditions and sensor setups. However, this task is non-trivial since extracting consistent and robust global descriptors from different modalities is challenging. To tackle this issue, we propose Voxel-Cross-Pixel (VXP), a novel camera-to-LiDAR place recognition framework that enforces local similarities in a self-supervised manner and effectively brings global context from images and LiDAR scans into a shared feature space. Specifically, VXP is trained in three stages: first, we deploy a visual transformer to compactly represent input images. Secondly, we establish local correspondences between image-based and point cloud-based feature spaces using our novel geometric alignment module. We then aggregate local similarities into an expressive shared latent space. Extensive experiments on the three benchmarks (Oxford RobotCar, ViViD++ and KITTI) demonstrate that our method surpasses the state-o
Authors
(none)
Tags
Stats
Related papers
- Are Local Features All You Need For Cross-domain Visual Place Recognition? (2023)13.80
- Modalink: Unifying Modalities For Efficient Image-to-pointcloud Place Recognition (2024)9.02
- Multires-netvlad: Augmenting Place Recognition Training With Low-resolution Imagery (2022)16.01
- Vlm-guided Visual Place Recognition For Planet-scale Geo-localization (2025)0.00
- Range And Bird's Eye View Fused Cross-modal Visual Place Recognition (2025)0.00
- City-scale Visual Place Recognition With Deep Local Features Based On Multi-scale Ordered VLAD Pooling (2020)1.69
- Graph-based Non-linear Least Squares Optimization For Visual Place Recognition In Changing Environments (2020)7.16
- Crossloc3d: Aerial-ground Cross-source 3D Place Recognition (2023)9.23