Leveraging Semantic Cues From Foundation Vision Models For Enhanced Local Feature Correspondence
2024 Β· Felipe Cadar, Guilherme Potje, Renato Martins, et al.
Abstract
Visual correspondence is a crucial step in key computer vision tasks, including camera localization, image registration, and structure from motion. The most effective techniques for matching keypoints currently involve using learned sparse or dense matchers, which need pairs of images. These neural networks have a good general understanding of features from both images, but they often struggle to match points from different semantic areas. This paper presents a new method that uses semantic cues from foundation vision model features (like DINOv2) to enhance local feature matching by incorporating semantic reasoning into existing descriptors. Therefore, the learned descriptors do not require image pairs at inference time, allowing feature caching and fast matching using similarity search, unlike learned matchers. We present adapted versions of six existing descriptors, with an average increase in performance of 29% in camera localization, with comparable accuracy to existing matchers as
Authors
(none)
Tags
Stats
Related papers
- Leveraging Local And Global Descriptors In Parallel To Search Correspondences For Visual Localization (2020)8.82
- Local Feature Matching Using Deep Learning: A Survey (2024)18.68
- Yes, We CANN: Constrained Approximate Nearest Neighbors For Local Feature-based Visual Localization (2023)14.99
- Learning Local Descriptors By Optimizing The Keypoint-correspondence Criterion: Applications To Face Matching, Learning From Unlabeled Videos And 3d-shape Retrieval (2016)11.75
- Sparse-to-dense Hypercolumn Matching For Long-term Visual Localization (2019)12.99
- Semantic Signatures For Large-scale Visual Localization (2020)5.24
- Superncn: Neighbourhood Consensus Network For Robust Outdoor Scenes Matching (2019)2.26
- Multi-scale Convolutions For Learning Context Aware Feature Representations (2019)0.00