Unlearning Offline Stochastic Multi-armed Bandits
2026 Β· Zichun Ye, Runqi Wang, Xuchuang Wang, et al.
Abstract
arXiv:2605.00638v1 Announce Type: new Abstract: Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms that switch between them according to the data regi
Authors
(none)
Tags
Stats
Related papers
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Is Offline Decision Making Possible With Only Few Samples? Reliable Decisions In Data-starved Bandits Via Trust Region Enhancement (2024)0.00
- Multi-agent Bandit Learning Through Heterogeneous Action Erasure Channels (2023)0.00
- Non-stationary Latent Auto-regressive Bandits (2024)0.00
- Reinforcement Unlearning (2023)4.52
- Learning For Bandits Under Action Erasures (2024)0.00
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00