Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning
2019 Β· Gang Chen, Yiming Peng
Abstract
We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art p
Authors
(none)
Tags
Stats
Related papers
- S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic (2024)2.41
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Greedy Actor-critic: A New Conditional Cross-entropy Method For Policy Improvement (2018)0.00
- Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past (2019)0.00
- Do You Need The Entropy Reward (in Practice)? (2022)0.00