Online Learning For Cooperative Multi-player Multi-armed Bandits
2021 Β· William Chang, Mehdi Jafarnia-Jahromi, Rahul Jain
Abstract
We introduce a framework for decentralized online learning for multi-armed bandits (MAB) with multiple cooperative players. The reward obtained by the players in each round depends on the actions taken by all the players. It's a team setting, and the objective is common. Information asymmetry is what makes the problem interesting and challenging. We consider three types of information asymmetry: action information asymmetry when the actions of the players can't be observed but the rewards received are common; reward information asymmetry when the actions of the other players are observable but rewards received are IID from the same distribution; and when we have both action and reward information asymmetry. For the first setting, we propose a UCB-inspired algorithm that achieves \(O(log T)\) regret whether the rewards are IID or Markovian. For the second section, we offer an environment such that the algorithm given for the first setting gives linear regret. For the third setting, we s
Authors
(none)
Tags
Stats
Related papers
- Optimal Cooperative Multiplayer Learning Bandits With Noisy Rewards And No Communication (2023)0.00
- Learning To Coordinate Under Threshold Rewards: A Cooperative Multi-agent Bandit Framework (2025)0.00
- Multi-agent Bandit Learning Through Heterogeneous Action Erasure Channels (2023)0.00
- Learning For Bandits Under Action Erasures (2024)0.00
- A Closer Look At The Worst-case Behavior Of Multi-armed Bandit Algorithms (2021)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Non-stationary Latent Auto-regressive Bandits (2024)0.00