Variance Reduction For Evolution Strategies Via Structured Control Variates
2019 Β· Yunhao Tang, Krzysztof Choromanski, Alp Kucukelbir
Abstract
Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying MDP structure to reduce the variance. We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.
Authors
(none)
Tags
Stats
Related papers
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- Accelerating Reinforcement Learning With A Directional-gaussian-smoothing Evolution Strategy (2020)6.77
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00
- Trajectory-wise Control Variates For Variance Reduction In Policy Gradient Methods (2019)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- A Globally Convergent Evolutionary Strategy For Stochastic Constrained Optimization With Applications To Reinforcement Learning (2022)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Meta Reinforcement Learning With Distribution Of Exploration Parameters Learned By Evolution Strategies (2018)0.00