On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation
2019 Β· Harshat Kumar, Alec Koppel, Alejandro Ribeiro
Abstract
Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates. Due to the fact that the updates exhibit correlated noise and biased gradient updates, only the asymptotic behavior of actor-critic is known by connecting its behavior to dynamical systems. This work puts forth a new variant of actor-critic that employs Monte Carlo rollouts during the policy search updates, which results in controllable bias that depends on the number of critic evaluations. As a result, we are able to provide for the first time the convergence rate of actor-critic algorithms when the policy search step employs policy gradient, agnostic to the choice of policy evaluation technique. In particular, we establish conditions under which the sample complexity is comparable to stoch
Authors
(none)
Tags
Stats
Related papers
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Finite-sample Analysis Of Off-policy Natural Actor-critic With Linear Function Approximation (2021)0.00
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00
- Improving Sample Complexity Bounds For (natural) Actor-critic Algorithms (2020)0.00
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation (2021)0.00