Do You Need The Entropy Reward (in Practice)?
2022 Β· Haonan Yu, Haichao Zhang, Wei Xu
Abstract
Maximum entropy (MaxEnt) RL maximizes a combination of the original task reward and an entropy reward. It is believed that the regularization imposed by entropy, on both policy improvement and policy evaluation, together contributes to good exploration, training convergence, and robustness of learned policies. This paper takes a closer look at entropy as an intrinsic reward, by conducting various ablation studies on soft actor-critic (SAC), a popular representative of MaxEnt RL. Our findings reveal that in general, entropy rewards should be applied with caution to policy evaluation. On one hand, the entropy reward, like any other intrinsic reward, could obscure the main task reward if it is not properly managed. We identify some failure cases of the entropy reward especially in episodic Markov decision processes (MDPs), where it could cause the policy to be overly optimistic or pessimistic. On the other hand, our large-scale empirical study shows that using entropy regularization alone
Authors
(none)
Tags
Stats
Related papers
- Off-policy Maximum Entropy RL With Future State And Action Visitation Measures (2024)0.00
- Maximum Entropy RL (provably) Solves Some Robust RL Problems (2021)0.00
- Understanding The Impact Of Entropy On Policy Optimization (2018)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic (2024)2.41
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Maximum-entropy Exploration With Future State-action Visitation Measures (2026)0.00
- Action Redundancy In Reinforcement Learning (2021)0.00