Transhash: Transformer-based Hamming Hashing For Efficient Image Retrieval

Abstract

Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt\{Resnet\}\cite\{he2016deep\}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf\{Transhash\}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit\{Vision Transformer\} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end mann

Transhash: Transformer-based Hamming Hashing For Efficient Image Retrieval

Abstract

Authors

Tags

Stats

Related papers