Abstract

Deep hamming hashing has gained growing popularity in approximate nearest neighbour search for large-scale image retrieval. Until now, the deep hashing for the image retrieval community has been dominated by convolutional neural network architectures, e.g. \texttt\{Resnet\}\cite\{he2016deep\}. In this paper, inspired by the recent advancements of vision transformers, we present \textbf\{Transhash\}, a pure transformer-based framework for deep hashing learning. Concretely, our framework is composed of two major modules: (1) Based on \textit\{Vision Transformer\} (ViT), we design a siamese vision transformer backbone for image feature extraction. To learn fine-grained features, we innovate a dual-stream feature learning on top of the transformer to learn discriminative global and local features. (2) Besides, we adopt a Bayesian learning scheme with a dynamically constructed similarity matrix to learn compact binary hash codes. The entire framework is jointly trained in an end-to-end mann

Authors

(none)

Tags

  • Image Retrieval
  • Deep Hashing

Stats

  • citations61
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score13.44
  • arxiv keychen2021transhash

Related papers