C-BEV: Contrastive Bird's Eye View Training For Cross-view Image Retrieval And 3-dof Pose Estimation
2023 Β· Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner, et al.
Abstract
To find the geolocation of a street-view image, cross-view geolocalization (CVGL) methods typically perform image retrieval on a database of georeferenced aerial images and determine the location from the visually most similar match. Recent approaches focus mainly on settings where street-view and aerial images are preselected to align w.r.t. translation or orientation, but struggle in challenging real-world scenarios where varying camera poses have to be matched to the same aerial image. We propose a novel trainable retrieval architecture that uses bird's eye view (BEV) maps rather than vectors as embedding representation, and explicitly addresses the many-to-one ambiguity that arises in real-world scenarios. The BEV-based retrieval is trained using the same contrastive setting and loss as classical retrieval. Our method C-BEV surpasses the state-of-the-art on the retrieval task on multiple datasets by a large margin. It is particularly effective in challenging many-to-one scenarios
Authors
(none)
Tags
Stats
Related papers
- BEV-CV: Birds-eye-view Transform For Cross-view Geo-localisation (2023)5.84
- Cross-view Image Geo-localization With Panorama-bev Co-retrieval Network (2024)13.94
- Cross-view Image Retrieval -- Ground To Aerial Image Retrieval Through Deep Learning (2020)5.24
- From Street To Orbit: Training-free Cross-view Retrieval Via Location Semantics And LLM Guidance (2025)0.00
- Range And Bird's Eye View Fused Cross-modal Visual Place Recognition (2025)0.00
- VIGOR: Cross-view Image Geo-localization Beyond One-to-one Retrieval (2020)21.49
- Just Zoom In: Cross-view Geo-localization Via Autoregressive Zooming (2026)0.00
- VICI: Vlm-instructed Cross-view Image-localisation (2025)2.51