Beyond Exact Gradients: Convergence Of Stochastic Soft-max Policy Gradient Methods With Entropy Regularization
2021 Β· Yuhao Ding, Junzi Zhang, Hyunin Lee, et al.
Abstract
Entropy regularization is an efficient technique for encouraging exploration and preventing a premature convergence of (vanilla) policy gradient methods in reinforcement learning (RL). However, the theoretical understanding of entropy-regularized RL algorithms has been limited. In this paper, we revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization, whose convergence has so far only been established assuming access to exact gradient oracles. To go beyond this scenario, we propose the first set of (nearly) unbiased stochastic policy gradient estimators with trajectory-level entropy regularization, with one being an unbiased visitation measure-based estimator and the other one being a nearly unbiased yet more practical trajectory-based estimator. We prove that although the estimators themselves are unbounded in general due to the additional logarithmic policy rewards introduced by the entropy term, the variances are uniformly bounded.
Authors
(none)
Tags
Stats
Related papers
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Matryoshka Policy Gradient For Entropy-regularized RL: Convergence And Global Optimality (2023)0.00
- Approximate Newton Policy Gradient Algorithms (2021)0.00
- Understanding The Impact Of Entropy On Policy Optimization (2018)0.00
- Entropy-augmented Entropy-regularized Reinforcement Learning And A Continuous Path From Policy Gradient To Q-learning (2020)0.00
- Linear Convergence Of Independent Natural Policy Gradient In Games With Entropy Regularization (2024)3.58
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00