Policy Gradient Methods For Reinforcement Learning With Function Approximation And Action-dependent Baselines
2017 Β· Philip S. Thomas, Emma Brunskill
Abstract
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
Authors
(none)
Tags
Stats
Related papers
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- All-action Policy Gradient Methods: A Numerical Integration Approach (2019)0.00
- Convergent Actor-critic Algorithms Under Off-policy Training And Function Approximation (2018)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- Compatible Gradient Approximations For Actor-critic Algorithms (2024)0.00
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- Policy Gradient In Partially Observable Environments: Approximation And Convergence (2018)0.00