Elvis: Efficient Visual Similarity From Local Descriptors That Generalizes Across Domains
2026 Β· Pavel Suma, Giorgos Kordopatis-Zilos, Yannis Kalantidis, et al.
Abstract
Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-rankin
Authors
(none)
Tags
Stats
Related papers
- ELIP: Enhanced Visual-language Foundation Models For Image Retrieval (2025)2.26
- Efficient And Discriminative Image Feature Extraction For Universal Image Retrieval (2024)4.94
- Visual Similarity Attention (2019)0.00
- Exploiting Local Indexing And Deep Feature Confidence Scores For Fast Image-to-video Search (2018)2.26
- AMES: Asymmetric And Memory-efficient Similarity Estimation For Instance-level Retrieval (2024)9.70
- Unifying Deep Local And Global Features For Image Search (2020)28.10
- Efficient Discovery And Effective Evaluation Of Visual Perceptual Similarity: A Benchmark And Beyond (2023)4.52
- Exploiting Distribution Constraints For Scalable And Efficient Image Retrieval (2024)0.00