Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation
2021 Β· Anas Barakat, Pascal Bianchi, Julien Lehmann
Abstract
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorpor
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Single-timescale Actor-critic (2022)0.00
- Actor-critic Or Critic-actor? A Tale Of Two Time Scales (2022)5.84
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- A Finite Time Analysis Of Two Time-scale Actor Critic Methods (2020)0.00
- Finite-sample Analysis Of Off-policy Natural Actor-critic With Linear Function Approximation (2021)0.00
- Non-asymptotic Analysis For Single-loop (natural) Actor-critic With Compatible Function Approximation (2024)0.00
- On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation (2019)11.49