On Strengths And Limitations Of Single-vector Embeddings
2026 Β· Archish S, Mihir Agarwal, Ankit Garg, et al.
Abstract
Recent work (Weller et al., 2025) introduced a naturalistic dataset called LIMIT and showed empirically that a wide range of popular single-vector embedding models suffer substantial drops in retrieval quality, raising concerns about the reliability of single-vector embeddings for retrieval. Although (Weller et al., 2025) proposed limited dimensionality as the main factor contributing to this, we show that dimensionality alone cannot explain the observed failures. We observe from results in (Alon et al., 2016) that \(2k+1\)-dimensional vector embeddings suffice for top-\(k\) retrieval. This result points to other drivers of poor performance. Controlling for tokenization artifacts and linguistic similarity between attributes yields only modest gains. In contrast, we find that domain shift and misalignment between embedding similarities and the task's underlying notion of relevance are major contributors; finetuning mitigates these effects and can improve recall substantially. Even wit
Authors
(none)
Tags
Stats
Related papers
- On The Theoretical Limitations Of Embedding-based Retrieval (2025)0.00
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- Breaking The Curse Of Dimensionality: On The Stability Of Modern Vector Retrieval (2025)0.00
- Gleanvec: Accelerating Vector Search With Minimalist Nonlinear Dimensionality Reduction (2024)0.00
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- Dense Retrievers Can Fail On Simple Queries: Revealing The Granularity Dilemma Of Embeddings (2025)2.86
- Randomly Removing 50% Of Dimensions In Text Embeddings Has Minimal Impact On Retrieval And Classification Tasks (2025)2.26
- Experimental Analysis Of Large-scale Learnable Vector Storage Compression (2023)7.50