VGGT-MPR: Vggt-enhanced Multimodal Place Recognition In Autonomous Driving Environments
2026 Β· Jingyi Xu, Zhangshuo Qi, Zhongmiao Yan, et al.
Abstract
In autonomous driving, robust place recognition is critical for global localization and loop closure detection. While inter-modality fusion of camera and LiDAR data in multimodal place recognition (MPR) has shown promise in overcoming the limitations of unimodal counterparts, existing MPR methods basically attend to hand-crafted fusion strategies and heavily parameterized backbones that require costly retraining. To address this, we propose VGGT-MPR, a multimodal place recognition framework that adopts the Visual Geometry Grounded Transformer (VGGT) as a unified geometric engine for both global retrieval and re-ranking. In the global retrieval stage, VGGT extracts geometrically-rich visual embeddings through prior depth-aware and point map supervision, and densifies sparse LiDAR point clouds with predicted depth maps to improve structural representation. This enhances the discriminative ability of fused multimodal features and produces global descriptors for fast retrieval. Beyond glob
Authors
(none)
Tags
Stats
Related papers
- Unipr-3d: Towards Universal Visual Place Recognition With Visual Geometry Grounded Transformer (2025)2.95
- Vlm-guided Visual Place Recognition For Planet-scale Geo-localization (2025)0.00
- Embodiedplace: Learning Mixture-of-features With Embodied Constraints For Visual Place Recognition (2025)0.00
- Mixvpr: Feature Mixing For Visual Place Recognition (2023)22.68
- Range And Bird's Eye View Fused Cross-modal Visual Place Recognition (2025)0.00
- Multires-netvlad: Augmenting Place Recognition Training With Low-resolution Imagery (2022)16.01
- Structvpr++: Distill Structural And Semantic Knowledge With Weighting Samples For Visual Place Recognition (2025)3.58
- Evaluation Of Visual Place Recognition Methods For Image Pair Retrieval In 3D Vision And Robotics (2026)0.00