Transmatcher: Deep Image Matching Through Transformers For Generalizable Person Re-identification
2021 Β· Shengcai Liao, Ling Shao
Abstract
Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation, which is not naturally suitable for image matching. Accordingly, we propose a new simplified decoder, w
Authors
(none)
Tags
Stats
Related papers
- Interpretable And Generalizable Person Re-identification With Query-adaptive Convolution And Temporal Lifting (2019)20.43
- Training Vision Transformers For Image Retrieval (2021)0.00
- Boosting Vision Transformers For Image Retrieval (2022)15.28
- Transhash: Transformer-based Hamming Hashing For Efficient Image Retrieval (2021)13.44
- STIR: Siamese Transformer For Image Retrieval Postprocessing (2023)11.23
- Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021)15.16
- Vision Transformer Hashing For Image Retrieval (2021)17.01
- Deep Co-attention Based Comparators For Relative Representation Learning In Person Re-identification (2018)13.34