One Good Source Is All You Need: Near-optimal Regret For Bandits Under Heterogeneous Noise
2026 Β· Amith Bhat, Haipeng Luo, Aadirupa Saha
Abstract
arXiv:2602.14474v2 Announce Type: replace Abstract: We study \(K\)-armed Multiarmed Bandit (MAB) problem with \(M\) heterogeneous data sources, each exhibiting unknown and distinct noise variances \(\\{\sigma_j^2\\}_\{j=1\}^M\). The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of \(\tilde\{O\}\left(\{\sigma^*\}^2\sum_\{i=2\}^K \frac\{log T\}\{\Delta_i\} + \sqrt\{K \sum_\{j=1\}^M \sigma_j^2\}\right)\), up to preprocessing costs depending only on problem parameters, where \
Authors
(none)
Tags
Stats
Related papers
- A Closer Look At The Worst-case Behavior Of Multi-armed Bandit Algorithms (2021)0.00
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00
- Trading Off Rewards And Errors In Multi-armed Bandits (2026)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Stochastic Multi-armed Bandits With Limited Control Variates (2026)0.00
- Learning For Bandits Under Action Erasures (2024)0.00
- Multi-agent Bandit Learning Through Heterogeneous Action Erasure Channels (2023)0.00
- Pure Exploration Under Mediators' Feedback (2023)0.00