Tsallis Reinforcement Learning: A Unified Framework For Maximum Entropy Reinforcement Learning
2019 Β· Kyungjae Lee, Sungyub Kim, Sungbin Lim, et al.
Abstract
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for the original RL problem and RL with various types of entropy, including the well-known standard Shannon-Gibbs (SG) entropy, using an additional real-valued parameter, called an entropic index. By controlling the entropic index, we can generate various types of entropy, including the SG entropy, and a different entropy results in a different class of the optimal policy in Tsallis MDPs. We also provide a full mathematical analysis of Tsallis MDPs, including the optimality condition, performance error bounds, and convergence. Our theoretical result enables us to use any positive entropic index in RL. To handle complex and large-scale problems, we propose a model-free actor-critic RL method using Tsallis entropy maximization. We evaluate the re
Authors
(none)
Tags
Stats
Related papers
- Enforcing KL Regularization In General Tsallis Entropy Reinforcement Learning Via Advantage Learning (2022)0.00
- Do You Need The Entropy Reward (in Practice)? (2022)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- Maximum Entropy RL (provably) Solves Some Robust RL Problems (2021)0.00
- Off-policy Maximum Entropy RL With Future State And Action Visitation Measures (2024)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Path Consistency Learning In Tsallis Entropy Regularized Mdps (2018)0.00
- A Diffusion Model Framework For Maximum Entropy Reinforcement Learning (2025)0.00