Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence
2022 Β· Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar
Abstract
Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of natural policy gradient (NPG) algorithms in multi-agent learning. We first show that vanilla NPG may not have parameter convergence, i.e., the convergence of the vector that parameterizes the policy, even when the costs are regularized (which enabled strong convergence guarantees in the policy space in the literature). This non-convergence of parameters leads to stability issues in learning, which becomes especially relevant in the function approximation setting, where we can only operate on low-dimensional parameters, instead of the high-dimensional policy. We then propose variants of the NPG algorithm, for several standard multi-agent learning scenarios: two-player zero-sum matrix and Markov games, and multi-player monotone games, with global last-iter
Authors
(none)
Tags
Stats
Related papers
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- Linear Convergence Of Independent Natural Policy Gradient In Games With Entropy Regularization (2024)3.58
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction (2024)0.00
- Independent Natural Policy Gradient Always Converges In Markov Potential Games (2021)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Independent Policy Gradient For Large-scale Markov Potential Games: Sharper Rates, Function Approximation, And Game-agnostic Convergence (2022)0.00
- Dimension-free Rates For Natural Policy Gradient In Multi-agent Reinforcement Learning (2021)0.00