On The Convergence Of Discounted Policy Gradient Methods
2022 Β· Chris Nota
Abstract
Many popular policy gradient methods for reinforcement learning follow a biased approximation of the policy gradient known as the discounted approximation. While it has been shown that the discounted approximation of the policy gradient is not the gradient of any objective function, little else is known about its convergence behavior or properties. In this paper, we show that if the discounted approximation is followed such that the discount factor is increased slowly at a rate related to a decreasing learning rate, the resulting method recovers the standard guarantees of gradient ascent on the undiscounted objective.
Authors
(none)
Tags
Stats
Related papers
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- On The Second-order Convergence Of Biased Policy Gradient Algorithms (2023)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Policy Gradient In Partially Observable Environments: Approximation And Convergence (2018)0.00
- Analysis Of On-policy Policy Gradient Methods Under The Distribution Mismatch (2025)0.00
- Linear Convergence Of A Policy Gradient Method For Some Finite Horizon Continuous Time Control Problems (2022)0.00