Convergence Of Policy Gradient Methods For Finite-horizon Exploratory Linear-quadratic Control Problems
2022 Β· Michael Giegrich, Christoph Reisinger, Yufei Zhang
Abstract
We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence acr
Authors
(none)
Tags
Stats
Related papers
- Full Error Analysis Of Policy Gradient Learning Algorithms For Exploratory Linear Quadratic Mean-field Control Problem In Continuous Time With Common Noise (2024)0.00
- Linear Convergence Of A Policy Gradient Method For Some Finite Horizon Continuous Time Control Problems (2022)0.00
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00
- Global Convergence Of Policy Gradient For Linear-quadratic Mean-field Control/game In Continuous Time (2020)0.00
- Global Convergence Using Policy Gradient Methods For Model-free Markovian Jump Linear Quadratic Control (2021)0.00
- Convergence Guarantees Of Policy Optimization Methods For Markovian Jump Linear Systems (2020)9.03
- Some Remarks On Gradient Dominance And LQR Policy Optimization (2025)0.00
- Revisiting LQR Control From The Perspective Of Receding-horizon Policy Gradient (2023)8.60