Loaded Dice: Trading Off Bias And Variance In Any-order Score Function Estimators For Reinforcement Learning
2019 Β· Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Abstract
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our objective in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.
Authors
(none)
Tags
Stats
Related papers
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00
- Unifying Gradient Estimators For Meta-reinforcement Learning Via Off-policy Evaluation (2021)0.00
- Variance Reduction For Score Functions Using Optimal Baselines (2022)0.00
- Gradientdice: Rethinking Generalized Offline Estimation Of Stationary Values (2020)0.00
- On The Second-order Convergence Of Biased Policy Gradient Algorithms (2023)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning (2020)8.60