← all papers Β· overview

Deep Semantic-Attention Proxy Hashing With Coarse-to-Fine Representation for Multilabel Remote Sensing Image Retrieval

Abstract

Deep hashing techniques are widely used in remote sensing image retrieval due to their fast retrieval and low cost. However, existing multilabel retrieval methods encounter two main challenges. First, the input sample pixels commonly include irrelevant backgrounds, degrading semantic feature representation. Second, traditional vision transformer models directly adopt fine-grained feature extraction, which generates redundant long sequences and significantly increases computational costs. To address the mentioned problems, a deep semantic-attention proxy hashing framework (DSaPH) is proposed for multilabel remote sensing image retrieval. Specifically, to achieve the dynamic feature extraction of images, a coarse-to-fine representation module with two stages of coarse segmentation and fine segmentation is proposed, which identifies the object-related patches in the coarse-grained segmentation through critical region detection and aggregates the global features of the original image in combination with the feature reuse mechanism. To exclude background interference and focus on object information, the object-focus loss highlights the object information through Grad-CAM and generates object-fused image-focus hash codes. Furthermore, class-related proxy learning is introduced to use proxy hash codes to supervise the semantic representation of object-fused hash codes, and embed proxy hash codes and object-fused hash codes into a unified Hamming space to capture the Hamming similarity of neighboring multilabel data. Extensive experimental evaluations on three benchmark datasets show that our proposed DSaPH framework achieves excellent performance in multilabel remote sensing image retrieval. The code for our DSaPH framework is publicly available at https://github.com/QinLab-WFU/DSaPH.git.

Code

Related papers