One Good Source Is All You Need: Near-optimal Regret For Bandits Under Heterogeneous Noise

Abstract

arXiv:2602.14474v2 Announce Type: replace Abstract: We study \(K\)-armed Multiarmed Bandit (MAB) problem with \(M\) heterogeneous data sources, each exhibiting unknown and distinct noise variances \(\\{\sigma_j^2\\}_\{j=1\}^M\). The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of \(\tilde\{O\}\left(\{\sigma^*\}^2\sum_\{i=2\}^K \frac\{log T\}\{\Delta_i\} + \sqrt\{K \sum_\{j=1\}^M \sigma_j^2\}\right)\), up to preprocessing costs depending only on problem parameters, where \

One Good Source Is All You Need: Near-optimal Regret For Bandits Under Heterogeneous Noise

Abstract

Authors

Tags

Stats

Related papers