On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift
2019 Β· Alekh Agarwal, Sham M. Kakade, Jason D. Lee, et al.
Abstract
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing
Authors
(none)
Tags
Stats
Related papers
- On The Convergence Of Discounted Policy Gradient Methods (2022)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Policy Gradient In Partially Observable Environments: Approximation And Convergence (2018)0.00
- Analysis Of On-policy Policy Gradient Methods Under The Distribution Mismatch (2025)0.00
- Elementary Analysis Of Policy Gradient Methods (2024)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- A Parametric Class Of Approximate Gradient Updates For Policy Optimization (2022)0.00