Identifying Policy Gradient Subspaces
2024 Β· Jan Schneider, Pierre Schumacher, Simon Guist, et al.
Abstract
Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.
Authors
(none)
Tags
Stats
Related papers
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- Where Did My Optimum Go?: An Empirical Analysis Of Gradient Descent Optimization In Policy Gradient Methods (2018)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Policy Optimization With Second-order Advantage Information (2018)0.00