Revisiting Stochastic Off-policy Action-value Gradients
2017 Β· Yemi Okesanjo, Victor Kofia
Abstract
Off-policy stochastic actor-critic methods rely on approximating the stochastic policy gradient in order to derive an optimal policy. One may also derive the optimal policy by approximating the action-value gradient. The use of action-value gradients is desirable as policy improvement occurs along the direction of steepest ascent. This has been studied extensively within the context of natural gradient actor-critic algorithms and more recently within the context of deterministic policy gradients. In this paper we briefly discuss the off-policy stochastic counterpart to deterministic action-value gradients, as well as an incremental approach for following the policy gradient in lieu of the natural gradient.
Authors
(none)
Tags
Stats
Related papers
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00
- Neural Network Compatible Off-policy Natural Actor-critic Algorithm (2021)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Mitigating Suboptimality Of Deterministic Policy Gradients In Complex Q-functions (2024)0.00