On The I/O Complexity Of The K-nearest Neighbor Problem

We consider static, external memory indexes for exact and approximate versions of the (k)-nearest neighbor ((k)-NN) problem, and show new lower bounds under a standard indivisibility assumption:

Polynomial space indexing schemes for high-dimensional (k)-NN in Hamming space cannot take advantage of block transfers: (Ω(k)) block reads are needed to to answer a query.
For the (\ell_\infty) metric the lower bound holds even if we allow (c)-appoximate nearest neighbors to be returned, for (c \in (1, 3)).
The restriction to (c < 3) is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al.~using space (O(kn)), where (n) is the number of points, that can retrieve (k) 3-approximate nearest neighbors using (\lceil k/B\rceil) I/Os, which is optimal.
For specific metrics, data structures with better approximation factors are possible. For (k)-NN in Hamming space and every approximation factor (c>1) there exists a polynomial space data structure that returns (k) (c)-approximate nearest neighbors in (\lceil k/B\rceil) I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the (\lambda)-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in (d\geq n) dimensions. To extend the lower bounds down to (d = O(k log(n/k))) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.

Stay Updated

On The I/O Complexity Of The K-nearest Neighbor Problem

Similar Work