Learning For Bandits Under Action Erasures
2024 Β· Osama Hanna, Merve Karakas, Lin F. Yang, et al.
Abstract
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of \(O(1/\sqrt\{1-\epsilon\})\) away from the no-erasure worst-case regret of the underlying MAB algorithm, where \(\epsilon\) is the erasure probability. We also propose a modification of the successive arm elimination algorithm and prove that its worst-case regret is \(\Tilde\{O\}(\sqrt\{KT\}+K/(1-\epsilon))\), which we
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Bandit Learning Through Heterogeneous Action Erasure Channels (2023)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- A Closer Look At The Worst-case Behavior Of Multi-armed Bandit Algorithms (2021)0.00
- Optimal Cooperative Multiplayer Learning Bandits With Noisy Rewards And No Communication (2023)0.00
- Non-stationary Latent Auto-regressive Bandits (2024)0.00