Independent Policy Gradient Methods For Competitive Reinforcement Learning
2021 Β· Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
Abstract
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
Authors
(none)
Tags
Stats
Related papers
- Independent Policy Gradient For Large-scale Markov Potential Games: Sharper Rates, Function Approximation, And Game-agnostic Convergence (2022)0.00
- Independent Natural Policy Gradient Always Converges In Markov Potential Games (2021)0.00
- On The Convergence Of Policy Gradient Methods To Nash Equilibria In General Stochastic Games (2022)0.00
- Convergence Analysis Of Gradient-based Learning With Non-uniform Learning Rates In Non-cooperative Multi-agent Settings (2019)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- On The Second-order Convergence Of Biased Policy Gradient Algorithms (2023)0.00
- Optimistic Policy Gradient In Multi-player Markov Games With A Single Controller: Convergence Beyond The Minty Property (2023)3.58