Where Did My Optimum Go?: An Empirical Analysis Of Gradient Descent Optimization In Policy Gradient Methods
2018 Β· Peter Henderson, Joshua Romoff, Joelle Pineau
Abstract
Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target changes and thus the loss landscape (and local optima) change. Due to the failure modes of those methods, the ideal choice of optimizer for Deep RL remains unclear. As such, we provide an empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks. We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment.
Authors
(none)
Tags
Stats
Related papers
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- A Closer Look At Deep Policy Gradients (2018)0.00
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- Policy Gradient Algorithms Implicitly Optimize By Continuation (2023)0.00
- Identifying Policy Gradient Subspaces (2024)0.00
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26