Just Zoom In: Cross-view Geo-localization Via Autoregressive Zooming
2026 Β· Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz
Abstract
Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-referenced overhead imagery, enabling GPS-denied localization and navigation. Existing methods almost universally formulate CVGL as an image-retrieval problem in a contrastively trained embedding space. This ties performance to large batches and hard negative mining, and it ignores both the geometric structure of maps and the coverage mismatch between street-view and overhead imagery. In particular, salient landmarks visible from the street view can fall outside a fixed satellite crop, making retrieval targets ambiguous and limiting explicit spatial inference over the map. We propose Just Zoom In, an alternative formulation that performs CVGL via autoregressive zooming over a city-scale overhead map. Starting from a coarse satellite view, the model takes a short sequence of zoom-in decisions to select a terminal satellite cell at a target resolution, without contrastive losses or har
Authors
(none)
Tags
Stats
Related papers
- BEV-CV: Birds-eye-view Transform For Cross-view Geo-localisation (2023)5.84
- VIGOR: Cross-view Image Geo-localization Beyond One-to-one Retrieval (2020)21.49
- Cross-view Geo-localization, Image Retrieval, Multiscale Geometric Modeling, Frequency Domain Enhancement (2026)0.00
- Clnet: Cross-view Correspondence Makes A Stronger Geo-localizationer (2025)0.00
- Cross-view Image Matching For Geo-localization In Urban Environments (2017)17.16
- VICI: Vlm-instructed Cross-view Image-localisation (2025)2.51
- Geo-localization Via Ground-to-satellite Cross-view Image Retrieval (2022)12.54
- Cross-view Image Geo-localization With Panorama-bev Co-retrieval Network (2024)13.94