Urbancross: Enhancing Satellite Image-text Retrieval With Cross-domain Adaptation
2024 Β· Siru Zhong, Xixuan Hao, Yibo Yan, et al.
Abstract
Urbanization challenges underscore the necessity for effective satellite image-text retrieval methods to swiftly access specific information enriched with geographic semantics for urban applications. However, existing methods often overlook significant domain gaps across diverse urban landscapes, primarily focusing on enhancing retrieval performance within single domains. To tackle this issue, we present UrbanCross, a new framework for cross-domain satellite image-text retrieval. UrbanCross leverages a high-quality, cross-domain dataset enriched with extensive geo-tags from three countries to highlight domain diversity. It employs the Large Multimodal Model (LMM) for textual refinement and the Segment Anything Model (SAM) for visual augmentation, achieving a fine-grained alignment of images, segments and texts, yielding a 10% improvement in retrieval performance. Additionally, UrbanCross incorporates an adaptive curriculum-based source sampler and a weighted adversarial cross-domain fi
Authors
(none)
Tags
Stats
Related papers
- Remote Sensing Cross-modal Text-image Retrieval Based On Global And Local Information (2022)19.48
- Deep Unsupervised Contrastive Hashing For Large-scale Cross-modal Text-image Retrieval In Remote Sensing (2022)0.00
- From Street To Orbit: Training-free Cross-view Retrieval Via Location Semantics And LLM Guidance (2025)0.00
- Cross-view Image Retrieval -- Ground To Aerial Image Retrieval Through Deep Learning (2020)5.24
- DUDE: Diffusion-based Unsupervised Cross-domain Image Retrieval (2025)0.00
- Exploring A Fine-grained Multiscale Method For Cross-modal Remote Sensing Image Retrieval (2022)16.73
- Unsupervised Contrastive Hashing For Cross-modal Retrieval In Remote Sensing (2022)13.84
- Transcending Fusion: A Multi-scale Alignment Method For Remote Sensing Image-text Retrieval (2024)11.92