Policy Optimization For Continuous-time Linear-quadratic Graphon Mean Field Games
2025 Β· Philipp Plank, Yufei Zhang
Abstract
Multi-agent reinforcement learning, despite its popularity and empirical success, faces significant scalability challenges in large-population dynamic games. Graphon mean field games (GMFGs) offer a principled framework for approximating such games while capturing heterogeneity among players. In this paper, we propose and analyze a policy optimization framework for continuous-time, finite-horizon linear-quadratic GMFGs. Exploiting the structural properties of GMFGs, we design an efficient policy parameterization in which each player's policy is represented as an affine function of their private state, with a shared slope function and player-specific intercepts. We develop a bilevel optimization algorithm that alternates between policy gradient updates for best-response computation under a fixed population distribution, and distribution updates using the resulting policies. We prove linear convergence of the policy gradient steps to best-response policies and establish global convergenc
Authors
(none)
Tags
Stats
Related papers
- Global Convergence Of Policy Gradient For Linear-quadratic Mean-field Control/game In Continuous Time (2020)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Learning Regularized Graphon Mean-field Games With Unknown Graphons (2023)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- A General Framework For Learning Mean-field Games (2020)0.00
- Linear-quadratic Mean-field Reinforcement Learning: Convergence Of Policy Gradient Methods (2019)0.00
- Reinforcement Learning In Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence Of Policy Optimization (2020)6.77
- A Single Online Agent Can Efficiently Learn Mean Field Games (2024)0.00