Policy Gradient Methods For Reinforcement Learning With Function Approximation And Action-dependent Baselines

Abstract

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

Policy Gradient Methods For Reinforcement Learning With Function Approximation And Action-dependent Baselines

Abstract

Authors

Tags

Stats

Related papers