G3: An Effective And Adaptive Framework For Worldwide Geolocalization Using Large Multi-modality Models
2024 Β· Pengyue Jia, Yiding Liu, Xiaopeng Li, et al.
Abstract
Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose G3, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-
Authors
(none)
Tags
Stats
Related papers
- Img2loc: Revisiting Image Geolocalization Using Multi-modality Foundation Models And Image-based Retrieval-augmented Generation (2024)9.23
- Geoclip: Clip-inspired Alignment Between Locations And Images For Effective Worldwide Geo-localization (2023)5.84
- VIGOR: Cross-view Image Geo-localization Beyond One-to-one Retrieval (2020)21.49
- Leveraging Efficientnet And Contrastive Learning For Accurate Global-scale Location Estimation (2021)9.03
- A Unified Hierarchical Framework For Fine-grained Cross-view Geo-localization Over Large-scale Scenarios (2025)0.00
- Geo-localization Via Ground-to-satellite Cross-view Image Retrieval (2022)12.54
- Tiger: A Unified Framework For Time, Images And Geo-location Retrieval (2026)0.00
- Accurate 3-dof Camera Geo-localization Via Ground-to-satellite Image Matching (2022)12.17