Exploring The Meaningfulness Of Nearest Neighbor Search In High-dimensional Space
2024 Β· Zhonghan Chen, Ruiyuan Zhang, Xi Zhao, et al.
Abstract
Dense high dimensional vectors are becoming increasingly vital in fields such as computer vision, machine learning, and large language models (LLMs), serving as standard representations for multimodal data. Now the dimensionality of these vector can exceed several thousands easily. Despite the nearest neighbor search (NNS) over these dense high dimensional vectors have been widely used for retrieval augmented generation (RAG) and many other applications, the effectiveness of NNS in such a high-dimensional space remains uncertain, given the possible challenge caused by the "curse of dimensionality." To address above question, in this paper, we conduct extensive NNS studies with different distance functions, such as \(L_1\) distance, \(L_2\) distance and angular-distance, across diverse embedding datasets, of varied types, dimensionality and modality. Our aim is to investigate factors influencing the meaningfulness of NNS. Our experiments reveal that high-dimensional text embeddings exhi
Authors
(none)
Tags
Stats
Related papers
- Dimensionality-reduction Techniques For Approximate Nearest Neighbor Search: A Survey And Evaluation (2024)0.00
- Experimental Analysis Of Locality Sensitive Hashing Techniques For High-dimensional Approximate Nearest Neighbor Searches (2020)6.34
- On High-dimensional Modifications Of The Nearest Neighbor Classifier (2024)0.00
- High Dimensional Similarity Search With Satellite System Graph: Efficiency, Scalability, And Unindexed Query Compatibility (2019)17.30
- Lucene For Approximate Nearest-neighbors Search On Arbitrary Dense Vectors (2019)0.00
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- The Role Of Local Intrinsic Dimensionality In Benchmarking Nearest Neighbor Search (2019)6.77
- Medical Image Retrieval Via Nearest Neighbor Search On Pre-trained Image Features (2022)11.31