A Theory-based Evaluation Of Nearest Neighbor Models Put Into Practice

Abstract

In the $k$ -nearest neighborhood model ( $k$ -NN), we are given a set of points $P$ , and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$ -NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$ -NN is not explicit. We study property testing of $k$ -NN graphs in theory and evaluate it empirically: given a point set $P \subset {R}^{δ}$ and a directed graph $G = (P, E)$ , is $G$ a $k$ -NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $ϵ$ -far from being a $k$ -NN graph? Here, $ϵ$ -far means that one has to change more than an $ϵ$ -fraction of the edges in order to make $G$ a $k$ -NN graph. We develop a randomi

Abstract

Related papers