Pix2map: Cross-modal Retrieval For Inferring Street Maps From Images
2023 Β· Xindi Wu, Kwunfung Lau, Francesco Ferroni, et al.
Abstract
Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval fr
Authors
(none)
Tags
Stats
Related papers
- From Street To Orbit: Training-free Cross-view Retrieval Via Location Semantics And LLM Guidance (2025)0.00
- Cross-view Image Retrieval -- Ground To Aerial Image Retrieval Through Deep Learning (2020)5.24
- Cross-view Image Geo-localization With Panorama-bev Co-retrieval Network (2024)13.94
- VIGOR: Cross-view Image Geo-localization Beyond One-to-one Retrieval (2020)21.49
- Localizing And Orienting Street Views Using Overhead Imagery (2016)17.26
- Urbangraphembeddings: Learning And Evaluating Spatially Grounded Multimodal Embeddings For Urban Science (2026)0.00
- Cross-view Image Matching For Geo-localization In Urban Environments (2017)17.16
- Mapping, Localization And Path Planning For Image-based Navigation Using Visual Features And Map (2018)11.93