Regressing Transformers For Data-efficient Visual Place Recognition
2024 Β· MarΓa Leyva-Vallina, Nicola Strisciuglio, Nicolai Petkov
Abstract
Visual place recognition is a critical task in computer vision, especially for localization and navigation systems. Existing methods often rely on contrastive learning: image descriptors are trained to have small distance for similar images and larger distance for dissimilar ones in a latent space. However, this approach struggles to ensure accurate distance-based image similarity representation, particularly when training with binary pairwise labels, and complex re-ranking strategies are required. This work introduces a fresh perspective by framing place recognition as a regression problem, using camera field-of-view overlap as similarity ground truth for learning. By optimizing image descriptors to align directly with graded similarity labels, this approach enhances ranking capabilities without expensive re-ranking, offering data-efficient training and strong generalization across several benchmark datasets.
Authors
(none)
Tags
Stats
Related papers
- Data-efficient Large Scale Place Recognition With Graded Similarity Supervision (2023)16.32
- Graph-based Non-linear Least Squares Optimization For Visual Place Recognition In Changing Environments (2020)7.16
- Eigenplaces: Training Viewpoint Robust Models For Visual Place Recognition (2023)15.46
- Fast, Compact And Highly Scalable Visual Place Recognition Through Sequence-based Matching Of Overloaded Representations (2020)9.41
- Breaking The Frame: Visual Place Recognition By Overlap Prediction (2024)7.80
- Are Local Features All You Need For Cross-domain Visual Place Recognition? (2023)13.80
- \(r^{2}\)former: Unified \(r\)etrieval And \(r\)eranking Transformer For Place Recognition (2023)18.31
- Placeformer: Transformer-based Visual Place Recognition Using Multi-scale Patch Selection And Fusion (2024)7.81