Reinforcement Learning In Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence Of Policy Optimization
2020 Β· Masoud Roudneshin, Jalal Arabneydi, Amir G. Aghdam
Abstract
We study model-based and model-free policy optimization in a class of nonzero-sum stochastic dynamic games called linear quadratic (LQ) deep structured games. In such games, players interact with each other through a set of weighted averages (linear regressions) of the states and actions. In this paper, we focus our attention to homogeneous weights; however, for the special case of infinite population, the obtained results extend to asymptotically vanishing weights wherein the players learn the sequential weighted mean-field equilibrium. Despite the non-convexity of the optimization in policy space and the fact that policy optimization does not generally converge in game setting, we prove that the proposed model-based and model-free policy gradient descent and natural policy gradient descent algorithms globally converge to the sub-game perfect Nash equilibrium. To the best of our knowledge, this is the first result that provides a global convergence proof of policy optimization in a no
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning In Linear Quadratic Deep Structured Teams: Global Convergence Of Policy Gradient Methods (2020)5.84
- Learning Zero-sum Linear Quadratic Games With Improved Sample Complexity And Last-iterate Convergence (2023)0.00
- Policy-gradient Algorithms Have No Guarantees Of Convergence In Linear Quadratic Games (2019)5.24
- Policy Optimization For Continuous-time Linear-quadratic Graphon Mean Field Games (2025)0.00
- Global Convergence Of Policy Gradient For Linear-quadratic Mean-field Control/game In Continuous Time (2020)0.00
- Learning Distributed Equilibria In Linear-quadratic Stochastic Differential Games: An \(\alpha\)-potential Approach (2026)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Some Remarks On Gradient Dominance And LQR Policy Optimization (2025)0.00