Entropy-augmented Entropy-regularized Reinforcement Learning And A Continuous Path From Policy Gradient To Q-learning
2020 Β· Donghoon Lee
Abstract
Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of KL-divergence to regularize optimization process. It results in a policy which monotonically improves while interpolating from the current policy to the softmax greedy policy. This policy is used to build a continuously parameterized algorithm which optimize policy and Q-function simultaneously and whose extreme limits correspond to policy gradient and Q-learning, respectively. Experiments show that there can be a performance gain using an intermediate algorithm.
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Beyond Exact Gradients: Convergence Of Stochastic Soft-max Policy Gradient Methods With Entropy Regularization (2021)2.26
- Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning (2022)4.52
- Matryoshka Policy Gradient For Entropy-regularized RL: Convergence And Global Optimality (2023)0.00
- Understanding The Impact Of Entropy On Policy Optimization (2018)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Equivalence Between Policy Gradients And Soft Q-learning (2017)0.00
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)10.85