Path Consistency Learning In Tsallis Entropy Regularized Mdps
2018 Β· Ofir Nachum, Yinlam Chow, Mohammad Ghavamzadeh
Abstract
We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e.,~at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called \{\em sparse PCL\}, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a \{\em sparse consistency\} equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system t
Authors
(none)
Tags
Stats
Related papers
- Tsallis Reinforcement Learning: A Unified Framework For Maximum Entropy Reinforcement Learning (2019)0.00
- Enforcing KL Regularization In General Tsallis Entropy Reinforcement Learning Via Advantage Learning (2022)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- EPO: Entropy-regularized Policy Optimization For LLM Agents Reinforcement Learning (2025)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Entropy Regularized Reinforcement Learning Using Large Deviation Theory (2021)6.34
- Soft Policy Gradient Method For Maximum Entropy Deep Reinforcement Learning (2019)10.85