From Street To Orbit: Training-free Cross-view Retrieval Via Location Semantics And LLM Guidance
2025 Β· Jeongho Min, Dongyoung Kim, Jaehyup Lee
Abstract
Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments. However, existing approaches often require supervised training on curated datasets and rely on panoramic or UAV-based images, which limits real-world deployment. In this paper, we present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM), requiring no additional training. Given a monocular street-view image, our method extracts geographic cues through web-based image search and LLM-based location inference, generates a satellite query via geocoding API, and retrieves matching tiles using a pretrained vision encoder (e.g., DINOv2) with PCA-based whitening feature refinement. Despite using no ground-truth supervision or finetuning, our proposed method outperforms prior learning-based approaches on the benchm
Authors
(none)
Tags
Stats
Related papers
- VICI: Vlm-instructed Cross-view Image-localisation (2025)2.51
- Geo-localization Via Ground-to-satellite Cross-view Image Retrieval (2022)12.54
- C-BEV: Contrastive Bird's Eye View Training For Cross-view Image Retrieval And 3-dof Pose Estimation (2023)0.00
- Coming Down To Earth: Satellite-to-street View Synthesis For Geo-localization (2021)16.28
- Cross-view Image Retrieval -- Ground To Aerial Image Retrieval Through Deep Learning (2020)5.24
- Cross-view Image Geo-localization With Panorama-bev Co-retrieval Network (2024)13.94
- Cross-view Image Matching For Geo-localization In Urban Environments (2017)17.16
- Localizing And Orienting Street Views Using Overhead Imagery (2016)17.26