Actor-critic Or Critic-actor? A Tale Of Two Time Scales
2022 Β· Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin
Abstract
We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.
Authors
(none)
Tags
Stats
Related papers
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- A Finite Time Analysis Of Two Time-scale Actor Critic Methods (2020)0.00
- Analysis Of A Target-based Actor-critic Algorithm With Linear Function Approximation (2021)0.00
- Finite-time Analysis Of Single-timescale Actor-critic (2022)0.00
- Provably Convergent Two-timescale Off-policy Actor-critic With Function Approximation (2019)0.00
- Finite Sample Analysis Of Two-time-scale Natural Actor-critic Algorithm (2021)7.50
- Non-asymptotic Convergence Analysis Of Two Time-scale (natural) Actor-critic Algorithms (2020)0.00
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00