← all papers Β· overview

Deep Stochastic Spherical Hashing With Von Mises-Fisher Distributions for Cross-Modal Retrieval

Abstract

Deep cross-modal hashing is widely studied for its low storage cost and high retrieval efficiency. Despite recent progress, existing deep cross-modal hashing methods still face critical challenges. Most existing methods use Euclidean space for embedding to measure semantic similarity, but its volume grows polynomially with dimension, worsening the curse of dimensionality. In contrast, methods based on spherical space usually use cosine similarity as the metric, effectively mitigating the aforementioned problem by normalizing the embedding vectors. Nevertheless,such methods only considers the direction to determine the category, ignoring the uncertainty measure in the embedding space, thus having a limited ability to preserve inherent multimodal semantics. In this paper, with a novel extension of the maximum entropy distribution on the surface of a hypersphere von Mises-Fisher (vMF) distribution, a novel deep cross-modal hashing method, named Deep Stochastic Spherical Hashing (DSSH), is designed to utilize uncertain information to guide the hashing process and produce discriminative modality-invariant hash codes. Specifically, to learn explicit uncertainty in learned embedding space, the Spherical von Mises-Fisher distribution is applied for the first time in deep cross-modal hashing, where the direction of the sample embedding controls its position on the hypersphere, thereby preventing its semantic content, and its norm parameterizes the determinism of the distribution. In addition, stochastic spherical von Mises–Fisher loss is proposed to preserve the mode-specific semantic information of the sample, achieving the alignment of different modalities and semantic embeddings. Extensive experiments on four benchmark datasets show that our DSSH framework outperforms existing state-of-the-art methods.

Related papers