Img2loc: Revisiting Image Geolocalization Using Multi-modality Foundation Models And Image-based Retrieval-augmented Generation
2024 Β· Zhongliang Zhou, Jielu Zhang, Zihan Guan, et al.
Abstract
Geolocating precise locations from images presents a challenging problem in computer vision and information retrieval.Traditional methods typically employ either classification, which dividing the Earth surface into grid cells and classifying images accordingly, or retrieval, which identifying locations by matching images with a database of image-location pairs. However, classification-based approaches are limited by the cell size and cannot yield precise predictions, while retrieval-based systems usually suffer from poor search quality and inadequate coverage of the global landscape at varied scale and aggregation levels. To overcome these drawbacks, we present Img2Loc, a novel system that redefines image geolocalization as a text generation task. This is achieved using cutting-edge large multi-modality models like GPT4V or LLaVA with retrieval augmented generation. Img2Loc first employs CLIP-based representations to generate an image-based coordinate query database. It then uniquely
Authors
(none)
Tags
Stats
Related papers
- G3: An Effective And Adaptive Framework For Worldwide Geolocalization Using Large Multi-modality Models (2024)3.58
- Megaloc: One Retrieval To Place Them All (2025)9.19
- Revisiting IM2GPS In The Deep Learning Era (2017)14.62
- Location Sensitive Image Retrieval And Tagging (2020)2.26
- Leveraging Efficientnet And Contrastive Learning For Accurate Global-scale Location Estimation (2021)9.03
- Geoclip: Clip-inspired Alignment Between Locations And Images For Effective Worldwide Geo-localization (2023)5.84
- VIGOR: Cross-view Image Geo-localization Beyond One-to-one Retrieval (2020)21.49
- Investigating The Role Of Image Retrieval For Visual Localization -- An Exhaustive Benchmark (2022)16.58