Towards A Multimodal Framework For Remote Sensing Image Change Retrieval And Captioning
2024 Β· Roger Ferrod, Luigi di Caro, Dino Ienco
Abstract
Recently, there has been increasing interest in multimodal applications that integrate text with other modalities, such as images, audio and video, to facilitate natural language interactions with multimodal AI systems. While applications involving standard modalities have been extensively explored, there is still a lack of investigation into specific data modalities such as remote sensing (RS) data. Despite the numerous potential applications of RS data, including environmental protection, disaster monitoring and land planning, available solutions are predominantly focused on specific tasks like classification, captioning and retrieval. These solutions often overlook the unique characteristics of RS data, such as its capability to systematically provide information on the same geographical areas over time. This ability enables continuous monitoring of changes in the underlying landscape. To address this gap, we propose a novel foundation model for bi-temporal RS image pairs, in the co
Authors
(none)
Tags
Stats
Related papers
- Large Language Models For Captioning And Retrieving Remote Sensing Images (2024)0.00
- A Novel Self-supervised Cross-modal Image Retrieval Method In Remote Sensing (2022)8.35
- Exploring A Fine-grained Multiscale Method For Cross-modal Remote Sensing Image Retrieval (2022)16.73
- Remote Sensing Cross-modal Text-image Retrieval Based On Global And Local Information (2022)19.48
- Composed Image Retrieval For Remote Sensing (2024)11.03
- Remote Sensing Retrieval-augmented Generation: Bridging Remote Sensing Imagery And Comprehensive Knowledge With A Multi-modal Dataset And Retrieval-augmented Generation Model (2025)2.26
- CMIR-NET : A Deep Learning Based Model For Cross-modal Retrieval In Remote Sensing (2019)13.34
- Fast-then-fine: A Two-stage Framework With Multi-granular Representation For Cross-modal Retrieval In Remote Sensing (2026)0.00