Clipup: A Simple And Powerful Optimizer For Distribution-based Policy Evolution
2020 Β· Nihat Engin Toklu, PaweΕ Liskowski, Rupesh Kumar Srivastava
Abstract
Distribution-based search algorithms are an effective approach for evolutionary reinforcement learning of neural network controllers. In these algorithms, gradients of the total reward with respect to the policy parameters are estimated using a population of solutions drawn from a search distribution, and then used for policy optimization with stochastic gradient ascent. A common choice in the community is to use the Adam optimization algorithm for obtaining an adaptive behavior during gradient ascent, due to its success in a variety of supervised learning settings. As an alternative to Adam, we propose to enhance classical momentum-based gradient ascent with two simple techniques: gradient normalization and update clipping. We argue that the resulting optimizer called ClipUp (short for "clipped updates") is a better choice for distribution-based policy evolution because its working principles are simple and easy to understand and its hyperparameters can be tuned more intuitively in pr
Authors
(none)
Tags
Stats
Related papers
- Neural Ppo-clip Attains Global Optimality: A Hinge Loss Perspective (2021)0.00
- Policy Search By Target Distribution Learning For Continuous Control (2019)3.58
- Optimistic Natural Policy Gradient: A Simple Efficient Policy Optimization Framework For Online RL (2023)0.00
- Communication-efficient Policy Gradient Methods For Distributed Reinforcement Learning (2018)13.05
- Policy Optimization By Genetic Distillation (2017)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Distributional Policy Optimization: An Alternative Approach For Continuous Control (2019)0.00
- Supplementing Gradient-based Reinforcement Learning With Simple Evolutionary Ideas (2023)0.00