Composed Image Retrieval For Remote Sensing
2024 Β· Bill Psomas, Ioannis Kakogeorgiou, Nikos Efthymiadis, et al.
Abstract
This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is introduced. We demonstrate that a vision-language model possesses sufficient descriptive power and no further learning step or training data are necessary. We present a new evaluation benchmark focused on color, context, density, existence, quantity, and shape modifications. Our work not only sets the state-of-the-art for this task, but also serves as a foundational step in addressing a gap in the field of remote sensing image retrieval. Code at: https://github.com/billpsomas/rscir
Authors
(none)
Tags
Stats
Code
Related papers
- Composing Text And Image For Image Retrieval - An Empirical Odyssey (2018)18.71
- Data Roaming And Quality Assessment For Composed Image Retrieval (2023)11.39
- Large Language Models For Captioning And Retrieving Remote Sensing Images (2024)0.00
- Instance-level Composed Image Retrieval (2025)0.00
- Infocir: Multimedia Analysis For Composed Image Retrieval (2026)1.24
- Towards A Multimodal Framework For Remote Sensing Image Change Retrieval And Captioning (2024)8.85
- A Novel Self-supervised Cross-modal Image Retrieval Method In Remote Sensing (2022)8.35
- Image Retrieval On Real-life Images With Pre-trained Vision-and-language Models (2021)17.07