On The Theoretical Limitations Of Embedding-based Retrieval
2025 Β· Orion Weller, Michael Boratko, Iftekhar Naim, et al.
Abstract
Vector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, and more. These new benchmarks push embeddings to work for any query and any notion of relevance that could be given. While prior works have pointed out theoretical limitations of vector embeddings, there is a common assumption that these difficulties are exclusively due to unrealistic queries, and those that are not can be overcome with better training data and larger models. In this work, we demonstrate that we may encounter these theoretical limitations in realistic settings with extremely simple queries. We connect known results in learning theory, showing that the number of top-k subsets of documents capable of being returned as the result of some query is limited by the dimension of the embedding. We empirically show that this holds true even if we directly optimize on the test set with free parameterized
Authors
(none)
Tags
Stats
Related papers
- On Strengths And Limitations Of Single-vector Embeddings (2026)0.00
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- Breaking The Curse Of Dimensionality: On The Stability Of Modern Vector Retrieval (2025)0.00
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- Dense Retrievers Can Fail On Simple Queries: Revealing The Granularity Dilemma Of Embeddings (2025)2.86
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Experimental Analysis Of Large-scale Learnable Vector Storage Compression (2023)7.50
- Pylate: Flexible Training And Retrieval For Late Interaction Models (2025)3.58