Learning Zero-sum Linear Quadratic Games With Improved Sample Complexity And Last-iterate Convergence
2023 Β· Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, et al.
Abstract
Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i)~as a dynamic game formulation for risk-sensitive or robust control and (ii)~as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. showed that an~\(\epsilon\)-Nash equilibrium (NE) of finite horizon zero-sum LQ games can be learned via nested model-free Natural Policy Gradient (NPG) algorithms with poly\((1/\epsilon)\) sample complexity. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude and guaranteeing convergence of the last iterate. Our main results are two-fold: (i) in the deterministic setting, we establish the
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning In Nonzero-sum Linear Quadratic Deep Structured Games: Global Convergence Of Policy Optimization (2020)6.77
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- Last-iterate Convergence Of Payoff-based Independent Learning In Zero-sum Stochastic Games (2024)0.00
- Learning Distributed Equilibria In Linear-quadratic Stochastic Differential Games: An \(\alpha\)-potential Approach (2026)0.00
- Policy-gradient Algorithms Have No Guarantees Of Convergence In Linear Quadratic Games (2019)5.24
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- Teaching An Old Dynamics New Tricks: Regularization-free Last-iterate Convergence In Zero-sum Games Via BNN Dynamics (2026)0.00
- Learning Nash Equilibria In Zero-sum Stochastic Games Via Entropy-regularized Policy Approximation (2020)0.00