Random-Access Ranked Retrieval and Similarity Search

Abstract

arXiv:2508.01108v2 Announce Type: replace Abstract: We extend Random Access, a fundamental operation that enables efficient search and exploration algorithms, to the modern interactive data systems based on Ranked Retrieval and Similarity Search, where orderings are dynamically defined over a high-dimensional feature space. This extension enables efficient solutions for a wide range of applications, from data analytics tools and database systems to recommendation systems and machine learning. We formalize the Random-Access Ranked Retrieval (RAR) problem, and extend it to Similarity Search. Our algorithmic innovations include the development of a theoretically efficient algorithm based on geometric arrangements, achieving logarithmic query time. However, this method suffers from exponential space complexity in high dimensions. Therefore, we develop a second class of algorithms based on $\varepsilon$-sampling, which consume a linear space. Since exactly locating the tuple at a specific rank is challenging due to its connection to the range counting problem, we introduce a relaxed variant called $\kappa$-Random-Access Ranked Retrieval, which returns a small subset of size $\kappa$ guaranteed to contain the target tuple. To solve this problem efficiently, we define an intermediate problem, Stripe Range Retrieval (SRR), and design a hierarchical sampling data structure tailored for narrow stripe range queries. Our method achieves practical scalability in both data size and dimensionality. We prove near-optimal bounds on the efficiency of our algorithms and validate their performance through extensive experiments on real and synthetic datasets, demonstrating scalability to millions of tuples and hundreds of dimensions.

Abstract

Related papers