Deep Hamming Hashing Integrating Transformer and Attention Mechanism

Xinyi Wang·Dongsheng Guo·Weidong Zhang·Zehua Jia·Shan Xue·Wenbo Zhang·2026

Abstract

Recently, image retrieval from massive images has been considered as a crucial research topic. Current deep hashing methods mostly rely on convolutional neural network (CNN) as the backbone network. However, the CNN-based deep hashing methods require multiple convolution operations to capture global information, which lead to the loss of many shallow-level features in the process of feature extraction and affect the accuracy of binary hashing codes. To address such limitations, a new deep hashing method is proposed and studied in this paper by adopting the vision transformer model as the backbone network and designing a spatial reduction layer with channel attention mechanism fused. Such a deep hashing method can capture long-range dependencies effectively. The experiments on the CIFAR-10, NUSWIDE and ImageNet datasets indicate that the proposed method achieves higher mean average precision than other state-of-the-art hashing methods, thereby confirming the outstanding performance on image retrieval.

Abstract

Related papers