Abstract

arXiv:2602.14474v2 Announce Type: replace Abstract: We study \(K\)-armed Multiarmed Bandit (MAB) problem with \(M\) heterogeneous data sources, each exhibiting unknown and distinct noise variances \(\\{\sigma_j^2\\}_\{j=1\}^M\). The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of \(\tilde\{O\}\left(\{\sigma^*\}^2\sum_\{i=2\}^K \frac\{log T\}\{\Delta_i\} + \sqrt\{K \sum_\{j=1\}^M \sigma_j^2\}\right)\), up to preprocessing costs depending only on problem parameters, where \

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keybhat2026one

Related papers