Finite-sample Analysis Of Off-policy Natural Actor-critic With Linear Function Approximation
2021 Β· Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri
Abstract
In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of \(\mathcal\{O\}(\epsilon^\{-3\})\), outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs \(n\)-step TD-learning algorithm with a properly chosen \(n\). We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of \(\mathcal\{O\}(1/T)\) after \(T\) iterations. Combining the finite sample error bounds of actor and the critic, we obtain the \(\mathcal\{O\}(\epsilon^\{-3\})\) sample complexity. We derive our sample complexity bounds solely based on the
Authors
(none)
Tags
Stats
Related papers
- Non-asymptotic Analysis For Single-loop (natural) Actor-critic With Compatible Function Approximation (2024)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- On The Sample Complexity Of Actor-critic Method For Reinforcement Learning With Function Approximation (2019)11.49
- Finite Sample Analysis Of Two-time-scale Natural Actor-critic Algorithm (2021)7.50
- Provably Convergent Two-timescale Off-policy Actor-critic With Function Approximation (2019)0.00
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation (2021)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00