No-regret Learning In Unknown Games With Correlated Payoffs
2019 Β· Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, et al.
Abstract
We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in
Authors
(none)
Tags
Stats
Related papers
- Online Learning In Unknown Markov Games (2020)0.00
- Optimal Cooperative Multiplayer Learning Bandits With Noisy Rewards And No Communication (2023)0.00
- Multi-agent Learning In Contextual Games Under Unknown Constraints (2023)0.00
- Adversarial Learning In Games With Bandit Feedback: Logarithmic Pure-strategy Maximin Regret (2026)0.00
- Prediction-aware Learning In Multi-agent Systems (2025)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Distributed No-regret Learning In Multi-agent Systems (2020)0.00
- A Black-box Approach For Non-stationary Multi-agent Reinforcement Learning (2023)0.00