Variance Reduction For Score Functions Using Optimal Baselines
2022 Β· Ronan Keane, H. Oliver Gao
Abstract
Many problems involve the use of models which learn probability distributions or incorporate randomness in some way. In such problems, because computing the true expected gradient may be intractable, a gradient estimator is used to update the model parameters. When the model parameters directly affect a probability distribution, the gradient estimator will involve score function terms. This paper studies baselines, a variance reduction technique for score functions. Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline, the baseline which results in a gradient estimator with minimum variance. Although we show that there exist examples where the optimal baseline may be arbitrarily better than a value function baseline, we find that the value function baseline usually performs similarly to an optimal baseline in terms of variance reduction. Moreover, the value function can also be used for bootstrapping estimato
Authors
(none)
Tags
Stats
Related papers
- Beyond Variance Reduction: Understanding The True Impact Of Baselines On Policy Optimization (2020)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Loaded Dice: Trading Off Bias And Variance In Any-order Score Function Estimators For Reinforcement Learning (2019)0.00
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00
- Variance Reduction In Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) For Extensive Form Games Using Baselines (2018)10.48
- Sharp Variance-dependent Bounds In Reinforcement Learning: Best Of Both Worlds In Stochastic And Deterministic Environments (2023)0.00
- Hindsight Value Function For Variance Reduction In Stochastic Dynamic Environment (2021)2.26
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00