Efficient Task-specific Data Valuation For Nearest Neighbor Algorithms
2019 Β· Ruoxi Jia, David Dao, Boxin Wang, et al.
Abstract
Given a data set \(\mathcal\{D\}\) containing millions of data points and a data consumer who is willing to pay for \$\(X\) to train a machine learning (ML) model over \(\mathcal\{D\}\), how should we distribute this \$\(X\) to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all \(N\) data points, it requires \(O(2^N)\) model evaluations for exact computation and \(O(Nlog N)\) for \((\epsilon, \delta)\)-approximation. In this paper, we focus on one popular family of ML models relying on \(K\)-nearest neighbors (\(K\)NN). The most surprising result is that for unweighted \(K\)NN classifiers and regressors, the Shapley value of all \(N\) data points can be computed, ex
Authors
(none)
Tags
Stats
Related papers
- A Note On "efficient Task-specific Data Valuation For Nearest Neighbor Algorithms" (2023)15.75
- Efficient Data-aware Distance Comparison Operations For High-dimensional Approximate Nearest Neighbor Search (2024)5.24
- Local Distance Metric Learning For Nearest Neighbor Algorithm (2018)0.00
- Effective And General Distance Computation For Approximate Nearest Neighbor Search (2024)5.84
- Adaptive Estimation For Approximate K-nearest-neighbor Computations (2019)0.00
- On High-dimensional Modifications Of The Nearest Neighbor Classifier (2024)0.00
- Distributionally Robust Weighted \(k\)-nearest Neighbors (2020)0.00
- Fast And Bayes-consistent Nearest Neighbors (2019)0.00