<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://awesomepapers.io/learning-to-hash/feed/publications.xml" rel="self" type="application/atom+xml" /><link href="https://awesomepapers.io/learning-to-hash/" rel="alternate" type="text/html" /><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/feed/publications.xml</id><title type="html">Awesome Learning to Hash | Publications</title><subtitle>A continuously updated collection of research papers on Learning to Hash. Maintained by &lt;a href=&quot;https://sjmoran.github.io&quot;&gt;Sean Moran&lt;/a&gt;.</subtitle><author><name>Sean Moran</name><email></email></author><entry><title type="html">Deep Hashing Network for Efficient Similarity Retrieval</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhu2016deep/" rel="alternate" type="text/html" title="Deep Hashing Network for Efficient Similarity Retrieval" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhu2016deep</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhu2016deep/"><![CDATA[<p>Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the semantic similarity on data pairs, has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machine-learned features, followed by another separate quantization step that generates binary codes.
However, suboptimal hash coding may be produced, because the quantization error is not statistically minimized and the feature representation is not optimally compatible with the binary coding. In this paper, we propose a novel Deep Hashing Network (DHN) architecture for supervised hashing, in which we jointly learn good image representation tailored to hash coding and formally control the quantization error.
The DHN model constitutes four key components: (1) a sub-network with multiple convolution-pooling layers to capture image representations; (2) a fully-connected hashing layer to generate compact binary hash codes; (3) a pairwise cross-entropy loss layer for similarity-preserving learning; and (4) a pairwise quantization loss for controlling hashing quality. Extensive experiments on standard image retrieval datasets show the proposed DHN model yields substantial boosts over latest state-of-the-art hashing methods.</p>]]></content><author><name>Sean Moran</name></author><category term="Deep Learning" /><category term="Image Retrieval" /><category term="Quantisation" /><category term="Has Code" /><category term="AAAI" /><summary type="html"><![CDATA[Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the semantic similarity on data pairs, has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machine-learned features, followed by another separate quantization step that generates binary codes. However, suboptimal hash coding may be produced, because the quantization error is not statistically minimized and the feature representation is not optimally compatible with the binary coding. In this paper, we propose a novel Deep Hashing Network (DHN) architecture for supervised hashing, in which we jointly learn good image representation tailored to hash coding and formally control the quantization error. The DHN model constitutes four key components: (1) a sub-network with multiple convolution-pooling layers to capture image representations; (2) a fully-connected hashing layer to generate compact binary hash codes; (3) a pairwise cross-entropy loss layer for similarity-preserving learning; and (4) a pairwise quantization loss for controlling hashing quality. Extensive experiments on standard image retrieval datasets show the proposed DHN model yields substantial boosts over latest state-of-the-art hashing methods.]]></summary></entry><entry><title type="html">Linear cross-modal hashing for efficient multimedia search</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhu2013linear/" rel="alternate" type="text/html" title="Linear cross-modal hashing for efficient multimedia search" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhu2013linear</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhu2013linear/"><![CDATA[<p>Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel 
cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia 
search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals 
into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. 
More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data 
point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce 
the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to 
practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. 
To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a 
common binary subspace in which binary codes from all the modals are “consistent” and comparable. The transformation simultaneously 
outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, 
it is first mapped into the binary codes using the modal’s hash functions, followed by matching the database binary codes of any other 
modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in 
comparison with the state of the art.</p>]]></content><author><name>Sean Moran</name></author><category term="MM" /><category term="Cross-Modal" /><summary type="html"><![CDATA[Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are “consistent” and comparable. The transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal’s hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.]]></summary></entry><entry><title type="html">Cross-Modal Similarity Learning via Pairs, Preferences, and Active Supervision</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhen2015cross/" rel="alternate" type="text/html" title="Cross-Modal Similarity Learning via Pairs, Preferences, and Active Supervision" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhen2015cross</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhen2015cross/"><![CDATA[<p>We present a probabilistic framework for learning pairwise similarities between objects belonging to different modalities, such as drugs and proteins, or text and
images. Our framework is based on learning a binary
code based representation for objects in each modality, and has the following key properties: (i) it can
leverage both pairwise as well as easy-to-obtain relative
preference based cross-modal constraints, (ii) the probabilistic framework naturally allows querying for the
most useful/informative constraints, facilitating an active learning setting (existing methods for cross-modal
similarity learning do not have such a mechanism), and
(iii) the binary code length is learned from the data. We
demonstrate the effectiveness of the proposed approach
on two problems that require computing pairwise similarities between cross-modal object pairs: cross-modal
link prediction in bipartite graphs, and hashing based
cross-modal similarity search.</p>]]></content><author><name>Sean Moran</name></author><category term="Cross-Modal" /><category term="AAAI" /><summary type="html"><![CDATA[We present a probabilistic framework for learning pairwise similarities between objects belonging to different modalities, such as drugs and proteins, or text and images. Our framework is based on learning a binary code based representation for objects in each modality, and has the following key properties: (i) it can leverage both pairwise as well as easy-to-obtain relative preference based cross-modal constraints, (ii) the probabilistic framework naturally allows querying for the most useful/informative constraints, facilitating an active learning setting (existing methods for cross-modal similarity learning do not have such a mechanism), and (iii) the binary code length is learned from the data. We demonstrate the effectiveness of the proposed approach on two problems that require computing pairwise similarities between cross-modal object pairs: cross-modal link prediction in bipartite graphs, and hashing based cross-modal similarity search.]]></summary></entry><entry><title type="html">Co-Regularized Hashing for Multimodal Data</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhen2012coregularised/" rel="alternate" type="text/html" title="Co-Regularized Hashing for Multimodal Data" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhen2012coregularised</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhen2012coregularised/"><![CDATA[<p>Hashing-based methods provide a very promising approach to large-scale similarity
search. To obtain compact hash codes, a recent trend seeks to learn the hash
functions from data automatically. In this paper, we study hash function learning
in the context of multimodal data. We propose a novel multimodal hash function
learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization
framework. The hash functions for each bit of the hash codes are
learned by solving DC (difference of convex functions) programs, while the learning
for multiple bits proceeds via a boosting procedure so that the bias introduced
by the hash functions can be sequentially minimized. We empirically compare
CRH with two state-of-the-art multimodal hash function learning methods on two
publicly available data sets.</p>]]></content><author><name>Sean Moran</name></author><summary type="html"><![CDATA[Hashing-based methods provide a very promising approach to large-scale similarity search. To obtain compact hash codes, a recent trend seeks to learn the hash functions from data automatically. In this paper, we study hash function learning in the context of multimodal data. We propose a novel multimodal hash function learning method, called Co-Regularized Hashing (CRH), based on a boosted coregularization framework. The hash functions for each bit of the hash codes are learned by solving DC (difference of convex functions) programs, while the learning for multiple bits proceeds via a boosting procedure so that the bias introduced by the hash functions can be sequentially minimized. We empirically compare CRH with two state-of-the-art multimodal hash function learning methods on two publicly available data sets.]]></summary></entry><entry><title type="html">Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhao2015deep/" rel="alternate" type="text/html" title="Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhao2015deep</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhao2015deep/"><![CDATA[<p>With the rapid growth of web images, hashing has received
increasing interests in large scale image retrieval.
Research efforts have been devoted to learning compact binary
codes that preserve semantic similarity based on labels.
However, most of these hashing methods are designed
to handle simple binary similarity. The complex multilevel
semantic structure of images associated with multiple labels
have not yet been well explored. Here we propose a deep
semantic ranking based method for learning hash functions
that preserve multilevel semantic similarity between multilabel
images. In our approach, deep convolutional neural
network is incorporated into hash functions to jointly
learn feature representations and mappings from them to
hash codes, which avoids the limitation of semantic representation
power of hand-crafted features. Meanwhile, a
ranking list that encodes the multilevel similarity information
is employed to guide the learning of such deep hash
functions. An effective scheme based on surrogate loss is
used to solve the intractable optimization problem of nonsmooth
and multivariate ranking measures involved in the
learning procedure. Experimental results show the superiority
of our proposed approach over several state-of-theart
hashing methods in term of ranking evaluation metrics
when tested on multi-label image datasets.</p>]]></content><author><name>Sean Moran</name></author><category term="CVPR" /><category term="Deep Learning" /><category term="Image Retrieval" /><summary type="html"><![CDATA[With the rapid growth of web images, hashing has received increasing interests in large scale image retrieval. Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels. However, most of these hashing methods are designed to handle simple binary similarity. The complex multilevel semantic structure of images associated with multiple labels have not yet been well explored. Here we propose a deep semantic ranking based method for learning hash functions that preserve multilevel semantic similarity between multilabel images. In our approach, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of nonsmooth and multivariate ranking measures involved in the learning procedure. Experimental results show the superiority of our proposed approach over several state-of-theart hashing methods in term of ranking evaluation metrics when tested on multi-label image datasets.]]></summary></entry><entry><title type="html">Efficient Training of Very Deep Neural Networks for Supervised Hashing</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhang2016efficient/" rel="alternate" type="text/html" title="Efficient Training of Very Deep Neural Networks for Supervised Hashing" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhang2016efficient</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhang2016efficient/"><![CDATA[<p>In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively “shallow” networks limited by the issues arising in back propagation (e.e. vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of multipliers (ADMM) that overcomes some of these limitations. Our method decomposes the training process into independent layer-wise local updates through auxiliary variables. Empirically we observe that our training algorithm always converges and its computational complexity is linearly proportional to the number of edges in the networks. Empirically we manage to train DNNs with 64 hidden layers and 1024 nodes per layer for supervised hashing in about 3 hours using a single GPU. Our proposed very deep supervised hashing (VDSH) method significantly outperforms the state-of-the-art on several benchmark datasets.</p>]]></content><author><name>Sean Moran</name></author><category term="Deep Learning" /><category term="CVPR" /><summary type="html"><![CDATA[In this paper, we propose training very deep neural networks (DNNs) for supervised learning of hash codes. Existing methods in this context train relatively “shallow” networks limited by the issues arising in back propagation (e.e. vanishing gradients) as well as computational efficiency. We propose a novel and efficient training algorithm inspired by alternating direction method of multipliers (ADMM) that overcomes some of these limitations. Our method decomposes the training process into independent layer-wise local updates through auxiliary variables. Empirically we observe that our training algorithm always converges and its computational complexity is linearly proportional to the number of edges in the networks. Empirically we manage to train DNNs with 64 hidden layers and 1024 nodes per layer for supervised hashing in about 3 hours using a single GPU. Our proposed very deep supervised hashing (VDSH) method significantly outperforms the state-of-the-art on several benchmark datasets.]]></summary></entry><entry><title type="html">Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhang2015bit/" rel="alternate" type="text/html" title="Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhang2015bit</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhang2015bit/"><![CDATA[<p>Extracting informative image features and learning
effective approximate hashing functions are two crucial steps in
image retrieval . Conventional methods often study these two
steps separately, e.g., learning hash functions from a predefined
hand-crafted feature space. Meanwhile, the bit lengths of output
hashing codes are preset in most previous methods, neglecting the
significance level of different bits and restricting their practical
flexibility. To address these issues, we propose a supervised
learning framework to generate compact and bit-scalable hashing
codes directly from raw images. We pose hashing learning as
a problem of regularized similarity learning. Specifically, we
organize the training images into a batch of triplet samples,
each sample containing two images with the same label and one
with a different label. With these triplet samples, we maximize
the margin between matched pairs and mismatched pairs in the
Hamming space. In addition, a regularization term is introduced
to enforce the adjacency consistency, i.e., images of similar
appearances should have similar codes. The deep convolutional
neural network is utilized to train the model in an end-to-end
fashion, where discriminative image features and hash functions
are simultaneously optimized. Furthermore, each bit of our
hashing codes is unequally weighted so that we can manipulate
the code lengths by truncating the insignificant bits. Our
framework outperforms state-of-the-arts on public benchmarks
of similar image search and also achieves promising results in
the application of person re-identification in surveillance. It is
also shown that the generated bit-scalable hashing codes well
preserve the discriminative powers with shorter code lengths.</p>]]></content><author><name>Sean Moran</name></author><summary type="html"><![CDATA[Extracting informative image features and learning effective approximate hashing functions are two crucial steps in image retrieval . Conventional methods often study these two steps separately, e.g., learning hash functions from a predefined hand-crafted feature space. Meanwhile, the bit lengths of output hashing codes are preset in most previous methods, neglecting the significance level of different bits and restricting their practical flexibility. To address these issues, we propose a supervised learning framework to generate compact and bit-scalable hashing codes directly from raw images. We pose hashing learning as a problem of regularized similarity learning. Specifically, we organize the training images into a batch of triplet samples, each sample containing two images with the same label and one with a different label. With these triplet samples, we maximize the margin between matched pairs and mismatched pairs in the Hamming space. In addition, a regularization term is introduced to enforce the adjacency consistency, i.e., images of similar appearances should have similar codes. The deep convolutional neural network is utilized to train the model in an end-to-end fashion, where discriminative image features and hash functions are simultaneously optimized. Furthermore, each bit of our hashing codes is unequally weighted so that we can manipulate the code lengths by truncating the insignificant bits. Our framework outperforms state-of-the-arts on public benchmarks of similar image search and also achieves promising results in the application of person re-identification in surveillance. It is also shown that the generated bit-scalable hashing codes well preserve the discriminative powers with shorter code lengths.]]></summary></entry><entry><title type="html">Supervised Hashing with Latent Factor Models</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhang2014latent/" rel="alternate" type="text/html" title="Supervised Hashing with Latent Factor Models" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhang2014latent</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhang2014latent/"><![CDATA[<p>Due to its low storage cost and fast query speed, hashing
has been widely adopted for approximate nearest neighbor
search in large-scale datasets. Traditional hashing methods
try to learn the hash codes in an unsupervised way where
the metric (Euclidean) structure of the training data is preserved.
Very recently, supervised hashing methods, which
try to preserve the semantic structure constructed from the
semantic labels of the training points, have exhibited higher
accuracy than unsupervised methods. In this paper, we
propose a novel supervised hashing method, called latent
factor hashing (LFH), to learn similarity-preserving binary
codes based on latent factor models. An algorithm with
convergence guarantee is proposed to learn the parameters
of LFH. Furthermore, a linear-time variant with stochastic
learning is proposed for training LFH on large-scale datasets.
Experimental results on two large datasets with semantic
labels show that LFH can achieve superior accuracy than
state-of-the-art methods with comparable training time.</p>]]></content><author><name>Sean Moran</name></author><summary type="html"><![CDATA[Due to its low storage cost and fast query speed, hashing has been widely adopted for approximate nearest neighbor search in large-scale datasets. Traditional hashing methods try to learn the hash codes in an unsupervised way where the metric (Euclidean) structure of the training data is preserved. Very recently, supervised hashing methods, which try to preserve the semantic structure constructed from the semantic labels of the training points, have exhibited higher accuracy than unsupervised methods. In this paper, we propose a novel supervised hashing method, called latent factor hashing (LFH), to learn similarity-preserving binary codes based on latent factor models. An algorithm with convergence guarantee is proposed to learn the parameters of LFH. Furthermore, a linear-time variant with stochastic learning is proposed for training LFH on large-scale datasets. Experimental results on two large datasets with semantic labels show that LFH can achieve superior accuracy than state-of-the-art methods with comparable training time.]]></summary></entry><entry><title type="html">Large-scale supervised multimodal hashing with semantic correlation maximization</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhang2014largescale/" rel="alternate" type="text/html" title="Large-scale supervised multimodal hashing with semantic correlation maximization" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhang2014largescale</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhang2014largescale/"><![CDATA[<p>Due to its low storage cost and fast query speed, hashing
has been widely adopted for similarity search in multimedia
data. In particular, more and more attentions
have been payed to multimodal hashing for search in
multimedia data with multiple modalities, such as images
with tags. Typically, supervised information of semantic
labels is also available for the data points in
many real applications. Hence, many supervised multimodal
hashing (SMH) methods have been proposed
to utilize such semantic labels to further improve the
search accuracy. However, the training time complexity
of most existing SMH methods is too high, which
makes them unscalable to large-scale datasets. In this
paper, a novel SMH method, called semantic correlation
maximization (SCM), is proposed to seamlessly integrate
semantic labels into the hashing learning procedure
for large-scale data modeling. Experimental results
on two real-world datasets show that SCM can signifi-
cantly outperform the state-of-the-art SMH methods, in
terms of both accuracy and scalability.</p>]]></content><author><name>Sean Moran</name></author><category term="Cross-Modal" /><category term="AAAI" /><category term="Has Code" /><category term="Supervised" /><summary type="html"><![CDATA[Due to its low storage cost and fast query speed, hashing has been widely adopted for similarity search in multimedia data. In particular, more and more attentions have been payed to multimodal hashing for search in multimedia data with multiple modalities, such as images with tags. Typically, supervised information of semantic labels is also available for the data points in many real applications. Hence, many supervised multimodal hashing (SMH) methods have been proposed to utilize such semantic labels to further improve the search accuracy. However, the training time complexity of most existing SMH methods is too high, which makes them unscalable to large-scale datasets. In this paper, a novel SMH method, called semantic correlation maximization (SCM), is proposed to seamlessly integrate semantic labels into the hashing learning procedure for large-scale data modeling. Experimental results on two real-world datasets show that SCM can signifi- cantly outperform the state-of-the-art SMH methods, in terms of both accuracy and scalability.]]></summary></entry><entry><title type="html">Composite Hashing with Multiple Information Sources</title><link href="https://awesomepapers.io/learning-to-hash/publications/zhang2011composite/" rel="alternate" type="text/html" title="Composite Hashing with Multiple Information Sources" /><published>2026-02-27T08:17:41-06:00</published><updated>2026-02-27T08:17:41-06:00</updated><id>https://awesomepapers.io/learning-to-hash/publications/zhang2011composite</id><content type="html" xml:base="https://awesomepapers.io/learning-to-hash/publications/zhang2011composite/"><![CDATA[<p>Similarity search applications with a large amount of text
and image data demands an efficient and effective solution.
One useful strategy is to represent the examples in databases
as compact binary codes through semantic hashing, which
has attracted much attention due to its fast query/search
speed and drastically reduced storage requirement. All of
the current semantic hashing methods only deal with the
case when each example is represented by one type of features.
However, examples are often described from several
different information sources in many real world applications.
For example, the characteristics of a webpage can be
derived from both its content part and its associated links.
To address the problem of learning good hashing codes in
this scenario, we propose a novel research problem – Composite
Hashing with Multiple Information Sources (CHMIS).
The focus of the new research problem is to design an algorithm
for incorporating the features from different information
sources into the binary hashing codes efficiently and
effectively. In particular, we propose an algorithm CHMISAW
(CHMIS with Adjusted Weights) for learning the codes.
The proposed algorithm integrates information from several
different sources into the binary hashing codes by adjusting
the weights on each individual source for maximizing
the coding performance, and enables fast conversion from
query examples to their binary hashing codes. Experimental
results on five different datasets demonstrate the superior
performance of the proposed method against several other
state-of-the-art semantic hashing techniques.</p>]]></content><author><name>Sean Moran</name></author><summary type="html"><![CDATA[Similarity search applications with a large amount of text and image data demands an efficient and effective solution. One useful strategy is to represent the examples in databases as compact binary codes through semantic hashing, which has attracted much attention due to its fast query/search speed and drastically reduced storage requirement. All of the current semantic hashing methods only deal with the case when each example is represented by one type of features. However, examples are often described from several different information sources in many real world applications. For example, the characteristics of a webpage can be derived from both its content part and its associated links. To address the problem of learning good hashing codes in this scenario, we propose a novel research problem – Composite Hashing with Multiple Information Sources (CHMIS). The focus of the new research problem is to design an algorithm for incorporating the features from different information sources into the binary hashing codes efficiently and effectively. In particular, we propose an algorithm CHMISAW (CHMIS with Adjusted Weights) for learning the codes. The proposed algorithm integrates information from several different sources into the binary hashing codes by adjusting the weights on each individual source for maximizing the coding performance, and enables fast conversion from query examples to their binary hashing codes. Experimental results on five different datasets demonstrate the superior performance of the proposed method against several other state-of-the-art semantic hashing techniques.]]></summary></entry></feed>