Online Hyper-parameter Tuning In Off-policy Learning Via Evolutionary Strategies
2020 Β· Yunhao Tang, Krzysztof Choromanski
Abstract
Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters. However, unlike near on-policy algorithms for which hyper-parameters could be optimized via e.g. meta-gradients, similar techniques could not be straightforwardly applied to off-policy learning. In this work, we propose a framework which entails the application of Evolutionary Strategies to online hyper-parameter tuning in off-policy learning. Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces. We show that our method outperforms state-of-the-art off-policy learning baselines with static hyper-parameters and recent prior work over a wide range of continuous control benchmarks.
Authors
(none)
Tags
Stats
Related papers
- On Hyper-parameter Tuning For Stochastic Optimization Algorithms (2020)0.00
- Fast Efficient Hyperparameter Tuning For Policy Gradients (2019)0.00
- Online Off-policy Prediction (2018)0.00
- Hyperparameter Optimization Can Even Be Harmful In Off-policy Learning And How To Deal With It (2024)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00
- Towards Hyperparameter-free Policy Selection For Offline Reinforcement Learning (2021)0.00
- Efficacy Of Modern Neuro-evolutionary Strategies For Continuous Control Optimization (2019)0.00
- A Globally Convergent Evolutionary Strategy For Stochastic Constrained Optimization With Applications To Reinforcement Learning (2022)0.00