Diversity Actor-critic: Sample-aware Entropy Regularization For Sample-efficient Exploration
2020 Β· Seungyul Han, Youngchul Sung
Abstract
In this paper, sample-aware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration. Exploiting the sample distribution obtainable from the replay buffer, the proposed sample-aware entropy regularization maximizes the entropy of the weighted sum of the policy action distribution and the sample action distribution from the replay buffer for sample-efficient exploration. A practical algorithm named diversity actor-critic (DAC) is developed by applying policy iteration to the objective function with the proposed sample-aware entropy regularization. Numerical results show that DAC significantly outperforms existing recent algorithms for reinforcement learning.
Authors
(none)
Tags
Stats
Related papers
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Increasing Entropy To Boost Policy Gradient Performance On Personalization Tasks (2023)0.00
- An Entropy Regularization Free Mechanism For Policy-based Reinforcement Learning (2021)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Mutual-information Regularization In Markov Decision Processes And Actor-critic Learning (2019)0.00
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- Divergence-regularized Multi-agent Actor-critic (2021)0.00