Trading Off Rewards And Errors In Multi-armed Bandits
2026 Β· Akram Erraqabi, Alessandro Lazaric, Michal Valko, et al.
Abstract
arXiv:2605.00488v1 Announce Type: new Abstract: In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and accumulating reward, and present an algorithm with regret guarantees that interpolates between the two objectives. We provide both upper and lower bounds and validate empirically.
Authors
(none)
Tags
Stats
Related papers
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00
- A Closer Look At The Worst-case Behavior Of Multi-armed Bandit Algorithms (2021)0.00
- Online Learning With Erd\h{o}s-r\'enyi Side-observation Graphs (2026)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- One Good Source Is All You Need: Near-optimal Regret For Bandits Under Heterogeneous Noise (2026)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Flickering Multi-armed Bandits (2026)0.00