Improving Actor-critic Training With Steerable Action-value Approximation Errors
2024 Β· Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, et al.
Abstract
Off-policy actor-critic algorithms have shown strong potential in deep reinforcement learning for continuous control tasks. Their success primarily comes from leveraging pessimistic state-action value function updates, which reduce function approximation errors and stabilize learning. However, excessive pessimism can limit exploration, preventing the agent from effectively refining its policies. Conversely, optimism can encourage exploration but may lead to high-risk behaviors and unstable learning if not carefully managed. To address this trade-off, we propose Utility Soft Actor-Critic (USAC), a novel framework that allows independent, interpretable control of pessimism and optimism for both the actor and the critic. USAC dynamically adapts its exploration strategy based on the uncertainty of critics using a utility function, enabling a task-specific balance between optimism and pessimism. This approach goes beyond binary choices of pessimism or optimism, making the method both theore
Authors
(none)
Tags
Stats
Related papers
- Tactical Optimism And Pessimism For Deep Reinforcement Learning (2021)0.00
- Stochastic Actor-critic: Mitigating Overestimation Via Temporal Aleatoric Uncertainty (2026)0.00
- Wasserstein Barycenter Soft Actor-critic (2025)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- Wasserstein Actor-critic: Directed Exploration Via Optimism For Continuous-actions Control (2023)2.26
- Value Improved Actor Critic Algorithms (2024)0.00
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00