Lavpr: Benchmarking Language And Vision For Place Recognition
2026 Β· Ofer Idan, Dan Badur, Yosi Keller, et al.
Abstract
Visual Place Recognition (VPR) often fails under extreme environmental changes and perceptual aliasing. Furthermore, standard systems cannot perform "blind" localization from verbal descriptions alone, a capability needed for applications such as emergency response. To address these challenges, we introduce LaVPR, a large-scale benchmark that extends existing VPR datasets with over 650,000 rich natural-language descriptions. Using LaVPR, we investigate two paradigms: Multi-Modal Fusion for enhanced robustness and Cross-Modal Retrieval for language-based localization. Our results show that language descriptions yield consistent gains in visually degraded conditions, with the most significant impact on smaller backbones. Notably, adding language allows compact models to rival the performance of much larger vision-only architectures. For cross-modal retrieval, we establish a baseline using Low-Rank Adaptation (LoRA) and Multi-Similarity loss, which substantially outperforms standard contr
Authors
(none)
Tags
Stats
Related papers
- Evaluation Of Visual Place Recognition Methods For Image Pair Retrieval In 3D Vision And Robotics (2026)0.00
- Multires-netvlad: Augmenting Place Recognition Training With Low-resolution Imagery (2022)16.01
- Vlm-guided Visual Place Recognition For Planet-scale Geo-localization (2025)0.00
- Focus On Local: Finding Reliable Discriminative Regions For Visual Place Recognition (2025)10.70
- Range And Bird's Eye View Fused Cross-modal Visual Place Recognition (2025)0.00
- Mixvpr: Feature Mixing For Visual Place Recognition (2023)22.68
- Embodiedplace: Learning Mixture-of-features With Embodied Constraints For Visual Place Recognition (2025)0.00
- Data-efficient Large Scale Place Recognition With Graded Similarity Supervision (2023)16.32