Deep Binaries: Encoding Semantic-rich Cues For Efficient Textual-visual Cross Retrieval
2017 Β· Yuming Shen, Li Liu, Ling Shao, et al.
Abstract
Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching. Most of the traditional textual-visual binary encoding methods only consider holistic image representations and fail to model descriptive sentences. This renders existing methods inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To address the problem of hashing cross-modal data with semantic-rich cues, in this paper, a novel integrated deep architecture is developed to effectively encode the detailed semantics of informative images and long descriptive sentences, named as Textual-Visual Deep Binaries (TVDB). In particular, region-based convolutional networks with long short-term memory units are introduced to fully explore image regional details while semantic cues of sentences are modeled by a text convolutional networ
Authors
(none)
Tags
Stats
Related papers
- Deep Semantic Multimodal Hashing Network For Scalable Image-text And Video-text Retrievals (2019)14.43
- Discriminative Cross-view Binary Representation Learning (2018)4.52
- Unsupervised Deep Cross-modality Spectral Hashing (2020)11.39
- A Survey On Deep Text Hashing: Efficient Semantic Text Retrieval With Binary Representation (2025)3.83
- Video Retrieval Based On Deep Convolutional Neural Network (2017)9.03
- Unsupervised Deep Hashing For Large-scale Visual Search (2016)9.59
- Deep Multimodal Image-text Embeddings For Automatic Cross-media Retrieval (2020)0.00
- Dual Encoding For Video Retrieval By Text (2020)16.05