Evidential Transformers For Improved Image Retrieval
2024 Β· Danilo Dordevic, Suryansh Kumar
Abstract
We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.
Authors
(none)
Tags
Stats
Related papers
- Training Vision Transformers For Image Retrieval (2021)0.00
- Boosting Vision Transformers For Image Retrieval (2022)15.28
- Evit: Privacy-preserving Image Retrieval Via Encrypted Vision Transformer In Cloud Computing (2022)13.71
- Investigating The Vision Transformer Model For Image Retrieval Tasks (2021)10.74
- Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021)15.16
- Case-enhanced Vision Transformer: Improving Explanations Of Image Similarity With A Vit-based Similarity Metric (2024)0.00
- STIR: Siamese Transformer For Image Retrieval Postprocessing (2023)11.23
- Evaluating Dense Passage Retrieval Using Transformers (2022)0.00