A Note On "efficient Task-specific Data Valuation For Nearest Neighbor Algorithms"
2023 Β· Jiachen T. Wang, Ruoxi Jia
Abstract
Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) models, the computation of Data Shapley is surprisingly simple and efficient. In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models. We derive the corresponding calculation procedure for the Data Shapley of KNN classifiers/regressors with the new utility functions. Our new approach, dubbed soft-label KNN-SV, achieves the same time complexity as the original method. We further provide an efficient approximation algorithm for soft-label KNN-SV based on locality sensitive hashing (LS
Authors
(none)
Tags
Stats
Related papers
- Efficient Task-specific Data Valuation For Nearest Neighbor Algorithms (2019)15.75
- Efficient Data-aware Distance Comparison Operations For High-dimensional Approximate Nearest Neighbor Search (2024)5.24
- A Scalable Solution To The Nearest Neighbor Search Problem Through Local-search Methods On Neighbor Graphs (2017)3.58
- Explaining The Success Of Nearest Neighbor Methods In Prediction (2025)18.63
- Adaptive Estimation For Approximate K-nearest-neighbor Computations (2019)0.00
- A New Hashing Based Nearest Neighbors Selection Technique For Big Datasets (2020)3.58
- On High-dimensional Modifications Of The Nearest Neighbor Classifier (2024)0.00
- Interpretable Locally Adaptive Nearest Neighbors (2020)3.58