STIR: Siamese Transformer For Image Retrieval Postprocessing
2023 Β· Aleksei Shabanov, Aleksei Tarasov, Sergey Nikolenko
Abstract
Current metric learning approaches for image retrieval are usually based on learning a space of informative latent representations where simple approaches such as the cosine distance will work well. Recent state of the art methods such as HypViT move to more complex embedding spaces that may yield better results but are harder to scale to production environments. In this work, we first construct a simpler model based on triplet loss with hard negatives mining that performs at the state of the art level but does not have these drawbacks. Second, we introduce a novel approach for image retrieval postprocessing called Siamese Transformer for Image Retrieval (STIR) that reranks several top outputs in a single forward pass. Unlike previously proposed Reranking Transformers, STIR does not rely on global/local feature extraction and directly compares a query image and a retrieved candidate on pixel level with the usage of attention mechanism. The resulting approach defines a new state of the
Authors
(none)
Tags
Stats
Related papers
- Training Vision Transformers For Image Retrieval (2021)0.00
- Boosting Vision Transformers For Image Retrieval (2022)15.28
- Vision Transformer Hashing For Image Retrieval (2021)17.01
- Transhash: Transformer-based Hamming Hashing For Efficient Image Retrieval (2021)13.44
- DALG: Deep Attentive Local And Global Modeling For Image Retrieval (2022)0.00
- Instance-level Image Retrieval Using Reranking Transformers (2021)19.00
- Transmatcher: Deep Image Matching Through Transformers For Generalizable Person Re-identification (2021)4.68
- Contextual Similarity Aggregation With Self-attention For Visual Re-ranking (2021)0.00