Deep Semantic Multimodal Hashing Network For Scalable Image-text And Video-text Retrievals
2019 Β· Lu Jin, Zechao Li, Jinhui Tang
Abstract
Hashing has been widely applied to multimodal retrieval on large-scale multimedia data due to its efficiency in computation and storage. In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval. The proposed deep hashing framework leverages 2-D convolutional neural networks (CNN) as the backbone network to capture the spatial information for image-text retrieval, while the 3-D CNN as the backbone network to capture the spatial and temporal information for video-text retrieval. In the DSMHN, two sets of modality-specific hash functions are jointly learned by explicitly preserving both intermodality similarities and intramodality semantic labels. Specifically, with the assumption that the learned hash codes should be optimal for the classification task, two stream networks are jointly trained to learn the hash functions by embedding the semantic labels on the resultant hash codes. Moreover, a unified deep multim
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Semantic Deep Hashing (2018)10.48
- Unsupervised Deep Cross-modality Spectral Hashing (2020)11.39
- Transitive Hashing Network For Heterogeneous Multimedia Retrieval (2016)8.35
- Error-corrected Margin-based Deep Cross-modal Hashing For Facial Image Retrieval (2020)8.09
- Unsupervised Multi-modal Hashing For Cross-modal Retrieval (2019)8.35
- Correlation Hashing Network For Efficient Cross-modal Retrieval (2016)11.67
- Deep Hashing Learning For Visual And Semantic Retrieval Of Remote Sensing Images (2019)13.55
- Metric-learning Based Deep Hashing Network For Content Based Retrieval Of Remote Sensing Images (2019)13.93