Decentralized Model-free Reinforcement Learning In Stochastic Games With Average-reward Objective
2023 Β· Romain Cravic, Nicolas Gast, Bruno Gaujal
Abstract
We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective. In decentralized learning, the learning agent controls only one player and tries to achieve low regret performances against an arbitrary opponent. This contrasts with centralized learning where the agent tries to approximate the Nash equilibrium by controlling both players. In our infinite-horizon undiscounted setting, additional structure assumptions is needed to provide good behaviors of learning processes : here we assume for every strategy of the opponent, the agent has a way to go from any state to any other. This assumption is the analogous to the "communicating" assumption in the MDP setting. We show that our Decentralized Optimistic Nash Q-Learning (DONQ-learning) algorithm achieves both sublinear high probability regret of order \(T^\{3/4\}\) and sublinear expected regret of ord
Authors
(none)
Tags
Stats
Related papers
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Impact Of Decentralized Learning On Player Utilities In Stackelberg Games (2024)0.00
- Decentralized Q-learning In Zero-sum Markov Games (2021)0.00
- A Model-free Learning Algorithm For Infinite-horizon Average-reward Mdps With Near-optimal Regret (2020)0.00
- Online Learning In Unknown Markov Games (2020)0.00
- Regret Bounds For Decentralized Learning In Cooperative Multi-agent Dynamical Systems (2020)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-regret Learning In Markov Games (2022)0.00