Non-asymptotic Convergence Analysis Of Two Time-scale (natural) Actor-critic Algorithms
2020 Β· Tengyu Xu, Zhe Wang, Yingbin Liang
Abstract
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under Markovian sampling and with actor having general policy class approximation. We show that two time-scale AC r
Authors
(none)
Tags
Stats
Related papers
- A Finite Time Analysis Of Two Time-scale Actor Critic Methods (2020)0.00
- Non-asymptotic Analysis For Single-loop (natural) Actor-critic With Compatible Function Approximation (2024)0.00
- Improving Sample Complexity Bounds For (natural) Actor-critic Algorithms (2020)0.00
- Finite Sample Analysis Of Two-time-scale Natural Actor-critic Algorithm (2021)7.50
- Finite-time Analysis Of Fully Decentralized Single-timescale Actor-critic (2022)0.00
- Global Convergence Of Two-timescale Actor-critic For Solving Linear Quadratic Regulator (2022)4.52
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Finite-time Analysis Of Single-timescale Actor-critic (2022)0.00