Actor-critic Reinforcement Learning With Phased Actor
2024 Β· Ruofan Wu, Junmin Zhong, Jennie Si
Abstract
Policy gradient methods in actor-critic reinforcement learning (RL) have become perhaps the most promising approaches to solving continuous optimal control problems. However, the trial-and-error nature of RL and the inherent randomness associated with solution approximations cause variations in the learned optimal values and policies. This has significantly hindered their successful deployment in real life applications where control responses need to meet dynamic performance criteria deterministically. Here we propose a novel phased actor in actor-critic (PAAC) method, aiming at improving policy gradient estimation and thus the quality of the control policy. Specifically, PAAC accounts for both \(Q\) value and TD error in its actor update. We prove qualitative properties of PAAC for learning convergence of the value and policy, solution optimality, and stability of system dynamics. Additionally, we show variance reduction in policy gradient estimation. PAAC performance is systematicall
Authors
(none)
Tags
Stats
Related papers
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Natural Policy Gradient And Actor Critic Methods For Constrained Multi-task Reinforcement Learning (2024)0.00
- Actor Critic Learning Algorithms For Mean-field Control With Moment Neural Networks (2023)0.00
- Actor-critic Learning For Mean-field Control In Continuous Time (2023)0.00
- Multi-preference Actor Critic (2019)0.00
- Guide Actor-critic For Continuous Control (2017)0.00