Reveal Hidden Pitfalls And Navigate Next Generation Of Vector Similarity Search From Task-centric Views
2025 Β· Tingyang Chen, Cong Fu, Jiahua Wu, et al.
Abstract
Vector Similarity Search (VSS) in high-dimensional spaces is rapidly emerging as core functionality in next-generation database systems for numerous data-intensive services -- from embedding lookups in large language models (LLMs), to semantic information retrieval and recommendation engines. Current benchmarks, however, evaluate VSS primarily on the recall-latency trade-off against a ground truth defined solely by distance metrics, neglecting how retrieval quality ultimately impacts downstream tasks. This disconnect can mislead both academic research and industrial practice. We present Iceberg, a holistic benchmark suite for end-to-end evaluation of VSS methods in realistic application contexts. From a task-centric view, Iceberg uncovers the Information Loss Funnel, which identifies three principal sources of end-to-end performance degradation: (1) Embedding Loss during feature extraction; (2) Metric Misuse, where distances poorly reflect task relevance; (3) Data Distribution Sensit
Authors
(none)
Tags
Stats
Related papers
- Gleanvec: Accelerating Vector Search With Minimalist Nonlinear Dimensionality Reduction (2024)0.00
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Semantic Vector Encoding And Similarity Search Using Fulltext Search Engines (2017)6.77
- Starling: An I/o-efficient Disk-resident Graph Index Framework For High-dimensional Vector Similarity Search On Data Segment (2024)12.61
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- Leanvec: Searching Vectors Faster By Making Them Fit (2023)0.00
- Breaking The Curse Of Dimensionality: On The Stability Of Modern Vector Retrieval (2025)0.00
- Approximate Vector Set Search Inspired By Fly Olfactory Neural System (2024)0.00