ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization
2024 Β· Tianying Ji, Yongyuan Liang, Yan Zeng, et al.
Abstract
The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and effic
Authors
(none)
Tags
Stats
Related papers
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Greedy Actor-critic: A New Conditional Cross-entropy Method For Policy Improvement (2018)0.00
- Neural Network Compatible Off-policy Natural Actor-critic Algorithm (2021)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00