Linear-quadratic Mean-field Reinforcement Learning: Convergence Of Policy Gradient Methods
2019 · René Carmona, Mathieu Laurière, Zongjun Tan
Abstract
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots communicating through a central unit dispatching the optimal policy computed by maximizing an aggregate reward. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states and actions of the other agents. We first provide a full analysis this discrete-time mean field control problem. We then rigorously prove the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting and establish bounds on the rates of convergence. We also provide graphical evidence of the convergence based on implementations of our algorithms.
Authors
(none)
Tags
Stats
Related papers
- Global Convergence Of Policy Gradient For Linear-quadratic Mean-field Control/game In Continuous Time (2020)0.00
- Full Error Analysis Of Policy Gradient Learning Algorithms For Exploratory Linear Quadratic Mean-field Control Problem In Continuous Time With Common Noise (2024)0.00
- Reinforcement Learning In Linear Quadratic Deep Structured Teams: Global Convergence Of Policy Gradient Methods (2020)5.84
- Model-free Mean-field Reinforcement Learning: Mean-field MDP And Mean-field Q-learning (2019)0.00
- Global Convergence Using Policy Gradient Methods For Model-free Markovian Jump Linear Quadratic Control (2021)0.00
- Mean Field Multi-agent Reinforcement Learning (2018)2.26
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Actor-critic Learning For Mean-field Control In Continuous Time (2023)0.00