SVD Provably Denoises Nearest Neighbor Data
2026 Β· Ravindran Kannan, Kijun Shin, David Woodruff
Abstract
We study the Nearest Neighbor Search (NNS) problem in a high-dimensional setting where data lies in a low-dimensional subspace and is corrupted by Gaussian noise. Specifically, we consider a semi-random model in which \(n\) points from an unknown \(k\)-dimensional subspace of \(\mathbb\{R\}^d\) (\(k \ll d\)) are perturbed by zero-mean \(d\)-dimensional Gaussian noise with variance \(\sigma^2\) per coordinate. Assuming the second-nearest neighbor is at least a factor \((1+\epsilon)\) farther from the query than the nearest neighbor, and given only the noisy data, our goal is to recover the nearest neighbor in the uncorrupted data. We prove three results. First, for \(\sigma \in O(1/k^\{1/4\})\), simply performing SVD denoises the data and provably recovers the correct nearest neighbor of the uncorrupted data. Second, for \(\sigma \gg 1/k^\{1/4\}\), the nearest neighbor in the uncorrupted data is not even identifiable from the noisy data in general, giving a matching lower bound and show
Authors
(none)
Tags
Stats
Related papers
- On The Resistance Of Nearest Neighbor To Random Noisy Labels (2016)0.00
- Approximate Near Neighbors For General Symmetric Norms (2016)8.35
- An Alternative Proof Of The Vulnerability Of Retrieval In High Intrinsic Dimensionality Neighborhood (2020)0.00
- Worst-case Performance Of Popular Approximate Nearest Neighbor Search Implementations: Guarantees And Limitations (2023)5.84
- Exploring The Meaningfulness Of Nearest Neighbor Search In High-dimensional Space (2024)2.26
- High-dimensional Approximate Nearest Neighbor Search: With Reliable And Efficient Distance Comparison Operations (2023)13.44
- Sublinear Data Structures For Nearest Neighbor In Ultra High Dimensions (2025)0.00
- Subspace Collision: An Efficient And Accurate Framework For High-dimensional Approximate Nearest Neighbor Search (2024)7.16