Enforcing KL Regularization In General Tsallis Entropy Reinforcement Learning Via Advantage Learning
2022 Β· Lingwei Zhu, Zheng Chen, Eiji Uchibe, et al.
Abstract
Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form Tsallis entropies, but also exhibits comparable performance to state-of-the-art maximum Shannon entr
Authors
(none)
Tags
Stats
Related papers
- Tsallis Reinforcement Learning: A Unified Framework For Maximum Entropy Reinforcement Learning (2019)0.00
- Generalized Munchausen Reinforcement Learning Using Tsallis KL Divergence (2023)0.00
- Effective Exploration For Deep Reinforcement Learning Via Bootstrapped Q-ensembles Under Tsallis Entropy Regularization (2018)0.00
- Path Consistency Learning In Tsallis Entropy Regularized Mdps (2018)0.00
- Do You Need The Entropy Reward (in Practice)? (2022)0.00
- Entropy Regularized Reinforcement Learning Using Large Deviation Theory (2021)6.34
- Your Policy Regularizer Is Secretly An Adversary (2022)0.00
- Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning (2022)4.52