Natural Actor-critic Converges Globally For Hierarchical Linear Quadratic Regulator
2019 Β· Yuwei Luo, Zhuoran Yang, Zhaoran Wang, et al.
Abstract
Multi-agent reinforcement learning has been successfully applied to a number of challenging problems. Despite these empirical successes, theoretical understanding of different algorithms is lacking, primarily due to the curse of dimensionality caused by the exponential growth of the state-action space with the number of agents. We study a fundamental problem of multi-agent linear quadratic regulator (LQR) in a setting where the agents are partially exchangeable. In this setting, we develop a hierarchical actor-critic algorithm, whose computational complexity is independent of the total number of agents, and prove its global linear convergence to the optimal policy. As LQRs are often used to approximate general dynamic systems, this paper provides an important step towards a better understanding of general hierarchical mean-field multi-agent reinforcement learning.
Authors
(none)
Tags
Stats
Related papers
- Global Convergence Of Two-timescale Actor-critic For Solving Linear Quadratic Regulator (2022)4.52
- Multi-agent Natural Actor-critic Reinforcement Learning Algorithms (2021)3.58
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Actor-attention-critic For Multi-agent Reinforcement Learning (2018)0.00
- Actor-critic Algorithms For Constrained Multi-agent Reinforcement Learning (2019)0.00
- Distributed Q-learning With State Tracking For Multi-agent Networked Control (2020)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00