Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States
2022 Β· Chayan Banerjee, Zhiyong Chen, Nasimul Noman
Abstract
Actor-critic (AC) algorithms are a class of model-free deep reinforcement learning algorithms, which have proven their efficacy in diverse domains, especially in solving continuous control problems. Improvement of exploration (action entropy) and exploitation (expected return) using more efficient samples is a critical issue in AC algorithms. A basic strategy of a learning algorithm is to facilitate indiscriminately exploring all of the environment state space, as well as to encourage exploring rarely visited states rather than frequently visited one. Under this strategy, we propose a new method to boost exploration through an intrinsic reward, based on measurement of a state's novelty and the associated benefit of exploring the state (with regards to policy optimization), altogether called plausible novelty. With incentivized exploration of plausible novel states, an AC algorithm is able to improve its sample efficiency and hence training performance. The new method is verified by ext
Authors
(none)
Tags
Stats
Related papers
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Improving Actor-critic Training With Steerable Action-value Approximation Errors (2024)0.00
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Bounded Exploration With World Model Uncertainty In Soft Actor-critic Reinforcement Learning Algorithm (2024)0.00
- Generative Actor-critic: An Off-policy Algorithm Using The Push-forward Model (2021)0.00
- Wasserstein Barycenter Soft Actor-critic (2025)0.00
- Greedy Actor-critic: A New Conditional Cross-entropy Method For Policy Improvement (2018)0.00