Case-enhanced Vision Transformer: Improving Explanations Of Image Similarity With A Vit-based Similarity Metric
2024 Β· Ziwei Zhao, David Leake, Xiaomeng Ye, et al.
Abstract
This short paper presents preliminary research on the Case-Enhanced Vision Transformer (CEViT), a similarity measurement method aimed at improving the explainability of similarity assessments for image data. Initial experimental results suggest that integrating CEViT into k-Nearest Neighbor (k-NN) classification yields classification accuracy comparable to state-of-the-art computer vision models, while adding capabilities for illustrating differences between classes. CEViT explanations can be influenced by prior cases, to illustrate aspects of similarity relevant to those cases.
Authors
(none)
Tags
Stats
Related papers
- Towards Visually Explaining Similarity Models (2020)0.00
- VITR: Augmenting Vision Transformers With Relation-focused Learning For Cross-modal Information Retrieval (2023)4.52
- Evidential Transformers For Improved Image Retrieval (2024)0.00
- Visual Similarity Attention (2019)0.00
- Training Vision Transformers For Image Retrieval (2021)0.00
- Attributable Visual Similarity Learning (2022)11.68
- Corrembed: Evaluating Pre-trained Model Image Similarity Efficacy With A Novel Metric (2023)5.24
- Analyzing Local Representations Of Self-supervised Vision Transformers (2023)0.00