Exploring A Fine-grained Multiscale Method For Cross-modal Remote Sensing Image Retrieval
2022 Β· Zhiqiang Yuan, Wenkai Zhang, Kun Fu, et al.
Abstract
Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for its advantages of flexible input and efficient query. However, traditional methods ignore the characteristics of multi-scale and redundant targets in RS image, leading to the degradation of retrieval accuracy. To cope with the problem of multi-scale scarcity and target redundancy in RS multimodal retrieval task, we come up with a novel asymmetric multimodal feature matching network (AMFMN). Our model adapts to multi-scale feature inputs, favors multi-source retrieval methods, and can dynamically filter redundant features. AMFMN employs the multi-scale visual self-attention (MVSA) module to extract the salient features of RS image and utilizes visual features to guide the text representation. Furthermore, to alleviate the positive samples ambiguity caused by the strong intraclass similarity in RS image, we propose a triplet loss function with dynamic variable margin based on prior similarity of sa
Authors
(none)
Tags
Stats
Related papers
- A Novel Self-supervised Cross-modal Image Retrieval Method In Remote Sensing (2022)8.35
- Transcending Fusion: A Multi-scale Alignment Method For Remote Sensing Image-text Retrieval (2024)11.92
- Fast-then-fine: A Two-stage Framework With Multi-granular Representation For Cross-modal Retrieval In Remote Sensing (2026)0.00
- Remote Sensing Cross-modal Text-image Retrieval Based On Global And Local Information (2022)19.48
- CMIR-NET : A Deep Learning Based Model For Cross-modal Retrieval In Remote Sensing (2019)13.34
- Scale-semantic Joint Decoupling Network For Image-text Retrieval In Remote Sensing (2022)8.82
- Towards A Multimodal Framework For Remote Sensing Image Change Retrieval And Captioning (2024)8.85
- An Unsupervised Cross-modal Hashing Method Robust To Noisy Training Image-text Correspondences In Remote Sensing (2022)7.16