Single-timescale Actor-critic Provably Finds Globally Optimal Policy
2020 Β· Zuyue Fu, Zhuoran Yang, Zhaoran Wang
Abstract
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. While most existing works on actor-critic employ bi-level or two-timescale updates, we focus on the more practical single-timescale setting, where the actor and critic are updated simultaneously. Specifically, in each iteration, the critic update is obtained by applying the Bellman evaluation operator only once while the actor is updated in the policy gradient direction computed using the critic. Moreover, we consider two function approximation settings where both the actor and critic are represented by linear or deep neural networks. For both cases, we prove that the actor sequence converges to a globally optimal policy at a sublinear \(O(K^\{-1/2\})\) rate, where \(K\) is the number of iterations. To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approx
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Single-timescale Actor-critic (2022)0.00
- Global Convergence Of Two-timescale Actor-critic For Solving Linear Quadratic Regulator (2022)4.52
- Neural Policy Gradient Methods: Global Optimality And Rates Of Convergence (2019)0.00
- Finite Sample Analysis Of Two-time-scale Natural Actor-critic Algorithm (2021)7.50
- A Finite Time Analysis Of Two Time-scale Actor Critic Methods (2020)0.00
- Actor-critic Or Critic-actor? A Tale Of Two Time Scales (2022)5.84
- Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation (2021)0.00
- Non-asymptotic Convergence Analysis Of Two Time-scale (natural) Actor-critic Algorithms (2020)0.00