<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://awesomepapers.io/similarity-search/feed/publications.xml" rel="self" type="application/atom+xml" /><link href="https://awesomepapers.io/similarity-search/" rel="alternate" type="text/html" /><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/feed/publications.xml</id><title type="html">Awesome Similarity Search Papers | Publications</title><subtitle>A continuously updated collection of research papers on vector search, similarity retrieval, ANN algorithms, and embedding indexing. Maintained by the Awesome Papers team.</subtitle><author><name>Awesome Papers</name><email></email></author><entry><title type="html">Self-supervised Bernoulli Autoencoders For Semi-supervised Hashing</title><link href="https://awesomepapers.io/similarity-search/publications/%C3%B1anculef2020self/" rel="alternate" type="text/html" title="Self-supervised Bernoulli Autoencoders For Semi-supervised Hashing" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/%C3%B1anculef2020self</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/%C3%B1anculef2020self/"><![CDATA[<p>Semantic hashing is an emerging technique for large-scale similarity search
based on representing high-dimensional data using similarity-preserving binary
codes used for efficient indexing and search. It has recently been shown that
variational autoencoders, with Bernoulli latent representations parametrized by
neural nets, can be successfully trained to learn such codes in supervised and
unsupervised scenarios, improving on more traditional methods thanks to their
ability to handle the binary constraints architecturally. However, the scenario
where labels are scarce has not been studied yet.
  This paper investigates the robustness of hashing methods based on
variational autoencoders to the lack of supervision, focusing on two
semi-supervised approaches currently in use. The first augments the variational
autoencoder’s training objective to jointly model the distribution over the
data and the class labels. The second approach exploits the annotations to
define an additional pairwise loss that enforces consistency between the
similarity in the code (Hamming) space and the similarity in the label space.
Our experiments show that both methods can significantly increase the hash
codes’ quality. The pairwise approach can exhibit an advantage when the number
of labelled points is large. However, we found that this method degrades
quickly and loses its advantage when labelled samples decrease. To circumvent
this problem, we propose a novel supervision method in which the model uses its
label distribution predictions to implement the pairwise objective. Compared to
the best baseline, this procedure yields similar performance in fully
supervised settings but improves the results significantly when labelled data
is scarce. Our code is made publicly available at
https://github.com/amacaluso/SSB-VAE.</p>]]></content><author><name>Awesome Papers</name></author><category term="Supervised Hashing" /><summary type="html"><![CDATA[Semantic hashing is an emerging technique for large-scale similarity search based on representing high-dimensional data using similarity-preserving binary codes used for efficient indexing and search. It has recently been shown that variational autoencoders, with Bernoulli latent representations parametrized by neural nets, can be successfully trained to learn such codes in supervised and unsupervised scenarios, improving on more traditional methods thanks to their ability to handle the binary constraints architecturally. However, the scenario where labels are scarce has not been studied yet. This paper investigates the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. The first augments the variational autoencoder’s training objective to jointly model the distribution over the data and the class labels. The second approach exploits the annotations to define an additional pairwise loss that enforces consistency between the similarity in the code (Hamming) space and the similarity in the label space. Our experiments show that both methods can significantly increase the hash codes’ quality. The pairwise approach can exhibit an advantage when the number of labelled points is large. However, we found that this method degrades quickly and loses its advantage when labelled samples decrease. To circumvent this problem, we propose a novel supervision method in which the model uses its label distribution predictions to implement the pairwise objective. Compared to the best baseline, this procedure yields similar performance in fully supervised settings but improves the results significantly when labelled data is scarce. Our code is made publicly available at https://github.com/amacaluso/SSB-VAE.]]></summary></entry><entry><title type="html">Infinite-dimensional Mahalanobis Distance With Applications To Kernelized Novelty Detection</title><link href="https://awesomepapers.io/similarity-search/publications/zozoulenko2024infinite/" rel="alternate" type="text/html" title="Infinite-dimensional Mahalanobis Distance With Applications To Kernelized Novelty Detection" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zozoulenko2024infinite</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zozoulenko2024infinite/"><![CDATA[<p>The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in (\bbR^d). In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical (\bbR^d), functional ((L^2[0,1])^d), and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator, and establish consistency and convergence results for estimation using empirical measures with Tikhonov regularization. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance, and study some of its finite-sample concentration properties. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.</p>]]></content><author><name>Awesome Papers</name></author><category term="Uncategorized" /><summary type="html"><![CDATA[The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in (\bbR^d). In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical (\bbR^d), functional ((L^2[0,1])^d), and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator, and establish consistency and convergence results for estimation using empirical measures with Tikhonov regularization. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance, and study some of its finite-sample concentration properties. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.]]></summary></entry><entry><title type="html">RNSG: A Range-aware Graph Index For Efficient Range-filtered Approximate Nearest Neighbor Search</title><link href="https://awesomepapers.io/similarity-search/publications/zou2026rnsg/" rel="alternate" type="text/html" title="RNSG: A Range-aware Graph Index For Efficient Range-filtered Approximate Nearest Neighbor Search" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zou2026rnsg</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zou2026rnsg/"><![CDATA[<p>Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.</p>]]></content><author><name>Awesome Papers</name></author><category term="ANN Search" /><summary type="html"><![CDATA[Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.]]></summary></entry><entry><title type="html">Prompthash: Affinity-prompted Collaborative Cross-modal Learning For Adaptive Hashing Retrieval</title><link href="https://awesomepapers.io/similarity-search/publications/zou2025prompthash/" rel="alternate" type="text/html" title="Prompthash: Affinity-prompted Collaborative Cross-modal Learning For Adaptive Hashing Retrieval" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zou2025prompthash</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zou2025prompthash/"><![CDATA[<p>Cross-modal hashing is a promising approach for efficient data retrieval and
storage optimization. However, contemporary methods exhibit significant
limitations in semantic preservation, contextual integrity, and information
redundancy, which constrains retrieval efficacy. We present PromptHash, an
innovative framework leveraging affinity prompt-aware collaborative learning
for adaptive cross-modal hashing. We propose an end-to-end framework for
affinity-prompted collaborative hashing, with the following fundamental
technical contributions: (i) a text affinity prompt learning mechanism that
preserves contextual information while maintaining parameter efficiency, (ii)
an adaptive gated selection fusion architecture that synthesizes State Space
Model with Transformer network for precise cross-modal feature integration, and
(iii) a prompt affinity alignment strategy that bridges modal heterogeneity
through hierarchical contrastive learning. To the best of our knowledge, this
study presents the first investigation into affinity prompt awareness within
collaborative cross-modal adaptive hash learning, establishing a paradigm for
enhanced semantic consistency across modalities. Through comprehensive
evaluation on three benchmark multi-label datasets, PromptHash demonstrates
substantial performance improvements over existing approaches. Notably, on the
NUS-WIDE dataset, our method achieves significant gains of 18.22% and 18.65% in
image-to-text and text-to-image retrieval tasks, respectively. The code is
publicly available at https://github.com/ShiShuMo/PromptHash.</p>]]></content><author><name>Awesome Papers</name></author><category term="Image Retrieval" /><category term="Cross-Modal Hashing" /><category term="Survey Paper" /><summary type="html"><![CDATA[Cross-modal hashing is a promising approach for efficient data retrieval and storage optimization. However, contemporary methods exhibit significant limitations in semantic preservation, contextual integrity, and information redundancy, which constrains retrieval efficacy. We present PromptHash, an innovative framework leveraging affinity prompt-aware collaborative learning for adaptive cross-modal hashing. We propose an end-to-end framework for affinity-prompted collaborative hashing, with the following fundamental technical contributions: (i) a text affinity prompt learning mechanism that preserves contextual information while maintaining parameter efficiency, (ii) an adaptive gated selection fusion architecture that synthesizes State Space Model with Transformer network for precise cross-modal feature integration, and (iii) a prompt affinity alignment strategy that bridges modal heterogeneity through hierarchical contrastive learning. To the best of our knowledge, this study presents the first investigation into affinity prompt awareness within collaborative cross-modal adaptive hash learning, establishing a paradigm for enhanced semantic consistency across modalities. Through comprehensive evaluation on three benchmark multi-label datasets, PromptHash demonstrates substantial performance improvements over existing approaches. Notably, on the NUS-WIDE dataset, our method achieves significant gains of 18.22% and 18.65% in image-to-text and text-to-image retrieval tasks, respectively. The code is publicly available at https://github.com/ShiShuMo/PromptHash.]]></summary></entry><entry><title type="html">Transductive Zero-shot Hashing For Multilabel Image Retrieval</title><link href="https://awesomepapers.io/similarity-search/publications/zou2019transductive/" rel="alternate" type="text/html" title="Transductive Zero-shot Hashing For Multilabel Image Retrieval" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zou2019transductive</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zou2019transductive/"><![CDATA[<p>Hash coding has been widely used in approximate nearest neighbor search for
large-scale image retrieval. Given semantic annotations such as class labels
and pairwise similarities of the training data, hashing methods can learn and
generate effective and compact binary codes. While some newly introduced images
may contain undefined semantic labels, which we call unseen images, zeor-shot
hashing techniques have been studied. However, existing zeor-shot hashing
methods focus on the retrieval of single-label images, and cannot handle
multi-label images. In this paper, for the first time, a novel transductive
zero-shot hashing method is proposed for multi-label unseen image retrieval. In
order to predict the labels of the unseen/target data, a visual-semantic bridge
is built via instance-concept coherence ranking on the seen/source data. Then,
pairwise similarity loss and focal quantization loss are constructed for
training a hashing model using both the seen/source and unseen/target data.
Extensive evaluations on three popular multi-label datasets demonstrate that,
the proposed hashing method achieves significantly better results than the
competing methods.</p>]]></content><author><name>Awesome Papers</name></author><category term="Image Retrieval" /><category term="ANN Search" /><summary type="html"><![CDATA[Hash coding has been widely used in approximate nearest neighbor search for large-scale image retrieval. Given semantic annotations such as class labels and pairwise similarities of the training data, hashing methods can learn and generate effective and compact binary codes. While some newly introduced images may contain undefined semantic labels, which we call unseen images, zeor-shot hashing techniques have been studied. However, existing zeor-shot hashing methods focus on the retrieval of single-label images, and cannot handle multi-label images. In this paper, for the first time, a novel transductive zero-shot hashing method is proposed for multi-label unseen image retrieval. In order to predict the labels of the unseen/target data, a visual-semantic bridge is built via instance-concept coherence ranking on the seen/source data. Then, pairwise similarity loss and focal quantization loss are constructed for training a hashing model using both the seen/source and unseen/target data. Extensive evaluations on three popular multi-label datasets demonstrate that, the proposed hashing method achieves significantly better results than the competing methods.]]></summary></entry><entry><title type="html">Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees</title><link href="https://awesomepapers.io/similarity-search/publications/zoran2017learning/" rel="alternate" type="text/html" title="Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zoran2017learning</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zoran2017learning/"><![CDATA[<p>Nearest neighbor (kNN) methods have been gaining popularity in recent years
in light of advances in hardware and efficiency of algorithms. There is a
plethora of methods to choose from today, each with their own advantages and
disadvantages. One requirement shared between all kNN based methods is the need
for a good representation and distance measure between samples.
  We introduce a new method called differentiable boundary tree which allows
for learning deep kNN representations. We build on the recently proposed
boundary tree algorithm which allows for efficient nearest neighbor
classification, regression and retrieval. By modelling traversals in the tree
as stochastic events, we are able to form a differentiable cost function which
is associated with the tree’s predictions. Using a deep neural network to
transform the data and back-propagating through the tree allows us to learn
good representations for kNN methods.
  We demonstrate that our method is able to learn suitable representations
allowing for very efficient trees with a clearly interpretable structure.</p>]]></content><author><name>Awesome Papers</name></author><category term="Uncategorized" /><summary type="html"><![CDATA[Nearest neighbor (kNN) methods have been gaining popularity in recent years in light of advances in hardware and efficiency of algorithms. There is a plethora of methods to choose from today, each with their own advantages and disadvantages. One requirement shared between all kNN based methods is the need for a good representation and distance measure between samples. We introduce a new method called differentiable boundary tree which allows for learning deep kNN representations. We build on the recently proposed boundary tree algorithm which allows for efficient nearest neighbor classification, regression and retrieval. By modelling traversals in the tree as stochastic events, we are able to form a differentiable cost function which is associated with the tree’s predictions. Using a deep neural network to transform the data and back-propagating through the tree allows us to learn good representations for kNN methods. We demonstrate that our method is able to learn suitable representations allowing for very efficient trees with a clearly interpretable structure.]]></summary></entry><entry><title type="html">Bingan: Learning Compact Binary Descriptors With A Regularized GAN</title><link href="https://awesomepapers.io/similarity-search/publications/zieba2018bingan/" rel="alternate" type="text/html" title="Bingan: Learning Compact Binary Descriptors With A Regularized GAN" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zieba2018bingan</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zieba2018bingan/"><![CDATA[<p>In this paper, we propose a novel regularization method for Generative
Adversarial Networks, which allows the model to learn discriminative yet
compact binary representations of image patches (image descriptors). We employ
the dimensionality reduction that takes place in the intermediate layers of the
discriminator network and train binarized low-dimensional representation of the
penultimate layer to mimic the distribution of the higher-dimensional preceding
layers. To achieve this, we introduce two loss terms that aim at: (i) reducing
the correlation between the dimensions of the binarized low-dimensional
representation of the penultimate layer i. e. maximizing joint entropy) and
(ii) propagating the relations between the dimensions in the high-dimensional
space to the low-dimensional space. We evaluate the resulting binary image
descriptors on two challenging applications, image matching and retrieval, and
achieve state-of-the-art results.</p>]]></content><author><name>Awesome Papers</name></author><category term="Uncategorized" /><summary type="html"><![CDATA[In this paper, we propose a novel regularization method for Generative Adversarial Networks, which allows the model to learn discriminative yet compact binary representations of image patches (image descriptors). We employ the dimensionality reduction that takes place in the intermediate layers of the discriminator network and train binarized low-dimensional representation of the penultimate layer to mimic the distribution of the higher-dimensional preceding layers. To achieve this, we introduce two loss terms that aim at: (i) reducing the correlation between the dimensions of the binarized low-dimensional representation of the penultimate layer i. e. maximizing joint entropy) and (ii) propagating the relations between the dimensions in the high-dimensional space to the low-dimensional space. We evaluate the resulting binary image descriptors on two challenging applications, image matching and retrieval, and achieve state-of-the-art results.]]></summary></entry><entry><title type="html">Episode-specific Fine-tuning For Metric-based Few-shot Learners With Optimization-based Training</title><link href="https://awesomepapers.io/similarity-search/publications/zhuang2025episode/" rel="alternate" type="text/html" title="Episode-specific Fine-tuning For Metric-based Few-shot Learners With Optimization-based Training" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zhuang2025episode</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zhuang2025episode/"><![CDATA[<p>In few-shot classification tasks (so-called episodes), a small set of labeled support samples is provided during inference to aid the classification of unlabeled query samples. Metric-based models typically operate by computing similarities between query and support embeddings within a learned metric space, followed by nearest-neighbor classification. However, these labeled support samples are often underutilized–they are only used for similarity comparison, despite their potential to fine-tune and adapt the metric space itself to the classes in the current episode. To address this, we propose a series of simple yet effective episode-specific, during-inference fine-tuning methods for metric-based models, including Rotational Division Fine-Tuning (RDFT) and its two variants, Iterative Division Fine-Tuning (IDFT) and Augmented Division Fine-Tuning (ADFT). These methods construct pseudo support-query pairs from the given support set to enable fine-tuning even for non-parametric models. Nevertheless, the severely limited amount of data in each task poses a substantial risk of overfitting when applying such fine-tuning strategies. To mitigate this, we further propose to train the metric-based model within an optimization-based meta-learning framework. With the combined efforts of episode-specific fine-tuning and optimization-based meta-training, metric-based models are equipped with the ability to rapidly adapt to the limited support samples during inference while avoiding overfitting. We validate our approach on three audio datasets from diverse domains, namely ESC-50 (environmental sounds), Speech Commands V2 (spoken keywords), and Medley-solos-DB (musical instrument). Experimental results demonstrate that our approach consistently improves performance for all evaluated metric-based models (especially for attention-based models) and generalizes well across different audio domains.</p>]]></content><author><name>Awesome Papers</name></author><category term="Uncategorized" /><summary type="html"><![CDATA[In few-shot classification tasks (so-called episodes), a small set of labeled support samples is provided during inference to aid the classification of unlabeled query samples. Metric-based models typically operate by computing similarities between query and support embeddings within a learned metric space, followed by nearest-neighbor classification. However, these labeled support samples are often underutilized–they are only used for similarity comparison, despite their potential to fine-tune and adapt the metric space itself to the classes in the current episode. To address this, we propose a series of simple yet effective episode-specific, during-inference fine-tuning methods for metric-based models, including Rotational Division Fine-Tuning (RDFT) and its two variants, Iterative Division Fine-Tuning (IDFT) and Augmented Division Fine-Tuning (ADFT). These methods construct pseudo support-query pairs from the given support set to enable fine-tuning even for non-parametric models. Nevertheless, the severely limited amount of data in each task poses a substantial risk of overfitting when applying such fine-tuning strategies. To mitigate this, we further propose to train the metric-based model within an optimization-based meta-learning framework. With the combined efforts of episode-specific fine-tuning and optimization-based meta-training, metric-based models are equipped with the ability to rapidly adapt to the limited support samples during inference while avoiding overfitting. We validate our approach on three audio datasets from diverse domains, namely ESC-50 (environmental sounds), Speech Commands V2 (spoken keywords), and Medley-solos-DB (musical instrument). Experimental results demonstrate that our approach consistently improves performance for all evaluated metric-based models (especially for attention-based models) and generalizes well across different audio domains.]]></summary></entry><entry><title type="html">Effective Training Of Convolutional Neural Networks With Low-bitwidth Weights And Activations</title><link href="https://awesomepapers.io/similarity-search/publications/zhuang2019effective/" rel="alternate" type="text/html" title="Effective Training Of Convolutional Neural Networks With Low-bitwidth Weights And Activations" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zhuang2019effective</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zhuang2019effective/"><![CDATA[<p>This paper tackles the problem of training a deep convolutional neural
network of both low-bitwidth weights and activations. Optimizing a
low-precision network is very challenging due to the non-differentiability of
the quantizer, which may result in substantial accuracy loss. To address this,
we propose three practical approaches, including (i) progressive quantization;
(ii) stochastic precision; and (iii) joint knowledge distillation to improve
the network training. First, for progressive quantization, we propose two
schemes to progressively find good local minima. Specifically, we propose to
first optimize a net with quantized weights and subsequently quantize
activations. This is in contrast to the traditional methods which optimize them
simultaneously. Furthermore, we propose a second progressive quantization
scheme which gradually decreases the bit-width from high-precision to
low-precision during training. Second, to alleviate the excessive training
burden due to the multi-round training stages, we further propose a one-stage
stochastic precision strategy to randomly sample and quantize sub-networks
while keeping other parts in full-precision. Finally, we adopt a novel learning
scheme to jointly train a full-precision model alongside the low-precision one.
By doing so, the full-precision model provides hints to guide the low-precision
model training and significantly improves the performance of the low-precision
network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet)
show the effectiveness of the proposed methods.</p>]]></content><author><name>Awesome Papers</name></author><category term="Uncategorized" /><summary type="html"><![CDATA[This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a net with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.]]></summary></entry><entry><title type="html">Fast Training Of Triplet-based Deep Binary Embedding Networks</title><link href="https://awesomepapers.io/similarity-search/publications/zhuang2016fast/" rel="alternate" type="text/html" title="Fast Training Of Triplet-based Deep Binary Embedding Networks" /><published>2026-04-22T09:32:52-05:00</published><updated>2026-04-22T09:32:52-05:00</updated><id>https://awesomepapers.io/similarity-search/publications/zhuang2016fast</id><content type="html" xml:base="https://awesomepapers.io/similarity-search/publications/zhuang2016fast/"><![CDATA[<p>In this paper, we aim to learn a mapping (or embedding) from images to a
compact binary space in which Hamming distances correspond to a ranking measure
for the image retrieval task.
  We make use of a triplet loss because this has been shown to be most
effective for ranking problems.
  However, training in previous works can be prohibitively expensive due to the
fact that optimization is directly performed on the triplet space, where the
number of possible triplets for training is cubic in the number of training
examples.
  To address this issue, we propose to formulate high-order binary codes
learning as a multi-label classification problem by explicitly separating
learning into two interleaved stages.
  To solve the first stage, we design a large-scale high-order binary codes
inference algorithm to reduce the high-order objective to a standard binary
quadratic problem such that graph cuts can be used to efficiently infer the
binary code which serve as the label of each training datum.
  In the second stage we propose to map the original image to compact binary
codes via carefully designed deep convolutional neural networks (CNNs) and the
hashing function fitting can be solved by training binary CNN classifiers.
  An incremental/interleaved optimization strategy is proffered to ensure that
these two steps are interactive with each other during training for better
accuracy.
  We conduct experiments on several benchmark datasets, which demonstrate both
improved training time (by as much as two orders of magnitude) as well as
producing state-of-the-art hashing for various retrieval tasks.</p>]]></content><author><name>Awesome Papers</name></author><category term="Image Retrieval" /><category term="Survey Paper" /><summary type="html"><![CDATA[In this paper, we aim to learn a mapping (or embedding) from images to a compact binary space in which Hamming distances correspond to a ranking measure for the image retrieval task. We make use of a triplet loss because this has been shown to be most effective for ranking problems. However, training in previous works can be prohibitively expensive due to the fact that optimization is directly performed on the triplet space, where the number of possible triplets for training is cubic in the number of training examples. To address this issue, we propose to formulate high-order binary codes learning as a multi-label classification problem by explicitly separating learning into two interleaved stages. To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum. In the second stage we propose to map the original image to compact binary codes via carefully designed deep convolutional neural networks (CNNs) and the hashing function fitting can be solved by training binary CNN classifiers. An incremental/interleaved optimization strategy is proffered to ensure that these two steps are interactive with each other during training for better accuracy. We conduct experiments on several benchmark datasets, which demonstrate both improved training time (by as much as two orders of magnitude) as well as producing state-of-the-art hashing for various retrieval tasks.]]></summary></entry></feed>