Abstract

In this paper we revisit the kernel density estimation problem: given a kernel \(K(x, y)\) and a dataset of \(n\) points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query \(q\), a \((1+\epsilon)\)-approximation to \(\mu:=\frac1\{|P|\}\sum_\{p\in P\} K(p, q)\). First, we give a single data structure based on classical near neighbor search techniques that improves upon or essentially matches the query time and space complexity for all radial kernels considered in the literature so far. We then show how to improve both the query complexity and runtime by using recent advances in data-dependent near neighbor search. We achieve our results by giving a new implementation of the natural importance sampling scheme. Unlike previous approaches, our algorithm first samples the dataset uniformly (considering a geometric sequence of sampling rates), and then uses existing approximate near neighbor search techniques on the resulting smaller datase

Authors

(none)

Tags

  • ANN Search

Stats

  • citations5
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score5.84
  • arxiv keycharikar2020kernel

Related papers