Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation
2018 Β· Hamid Reza Maei
Abstract
We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to the-curse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-action value functions (Q functions). Using state-value functions helps to lift the curse and as a result naturally turn our policy-gradient solution into classical Actor-Critic architecture whose Actor uses state-value function for the update. Our algorithms, Gradient Actor-Critic and Emphatic Actor-Critic, are derived based on the exact gradient of averaged state-value function objective and thus are guaranteed to converge to its optimal solution, while maintaining all the desirable properties of classical Actor-Critic methods with no additional hyper-parameters. To our knowledge, this is the
Authors
(none)
Tags
Stats
Related papers
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Provably Convergent Two-timescale Off-policy Actor-critic With Function Approximation (2019)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Finite-sample Analysis Of Off-policy Natural Actor-critic With Linear Function Approximation (2021)0.00
- On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation (2019)11.49
- Linear Function Approximation As A Computationally Efficient Method To Solve Classical Reinforcement Learning Challenges (2024)0.00