Kernel Density Estimation Through Density Constrained Near Neighbor Search

Abstract

In this paper we revisit the kernel density estimation problem: given a kernel \(K(x, y)\) and a dataset of \(n\) points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query \(q\), a \((1+\epsilon)\)-approximation to \(\mu:=\frac1\{|P|\}\sum_\{p\in P\} K(p, q)\). First, we give a single data structure based on classical near neighbor search techniques that improves upon or essentially matches the query time and space complexity for all radial kernels considered in the literature so far. We then show how to improve both the query complexity and runtime by using recent advances in data-dependent near neighbor search. We achieve our results by giving a new implementation of the natural importance sampling scheme. Unlike previous approaches, our algorithm first samples the dataset uniformly (considering a geometric sequence of sampling rates), and then uses existing approximate near neighbor search techniques on the resulting smaller datase

Kernel Density Estimation Through Density Constrained Near Neighbor Search

Abstract

Authors

Tags

Stats

Related papers