An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms
2022 Β· Zaiwei Chen, Siva Theja Maguluri
Abstract
In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various policy update rules for the actor, including the celebrated natural policy gradient. In contrast to the gradient ascent approach taken in the literature, we view natural policy gradient as an approximate way of implementing policy iteration, and show that natural policy gradient (without any regularization) enjoys geometric convergence when using increasing stepsizes. As for the critic, we consider using TD-learning with linear function approximation and off-policy sampling. Since it is well-known that in this setting TD-learning can be unstable, we propose a stable generic algorithm (including two specific algorithms: the \(\lambda\)-averaged \(Q\)-trace and the two-sided \(Q\)-trace) that uses multi-step return and generalized importance sampling fa
Authors
(none)
Tags
Stats
Related papers
- On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation (2019)11.49
- Finite-sample Analysis Of Off-policy Natural Actor-critic With Linear Function Approximation (2021)0.00
- Beyond The Policy Gradient Theorem For Efficient Policy Updates In Actor-critic Algorithms (2022)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00
- On The Second-order Convergence Of Biased Policy Gradient Algorithms (2023)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00