\(r^{2}\)former: Unified \(r\)etrieval And \(r\)eranking Transformer For Place Recognition

Abstract

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database. Conventional methods generally adopt aggregated CNN features for global retrieval and RANSAC-based geometric verification for reranking. However, RANSAC only employs geometric information but ignores other possible information that could be useful for reranking, e.g. local feature correlations, and attention values. In this paper, we propose a unified place recognition framework that handles both retrieval and reranking with a novel transformer model, named \(R^\{2\}\)Former. The proposed reranking module takes feature correlation, attention value, and xy coordinates into account, and learns to determine whether the image pair is from the same location. The whole pipeline is end-to-end trainable and the reranking module alone can also be adopted on other CNN or transformer backbones as a generic component. Remarkably, \(R^\{2\}\)Former significantly outperforms st

\(r^{2}\)former: Unified \(r\)etrieval And \(r\)eranking Transformer For Place Recognition

Abstract

Authors

Tags

Stats

Related papers