PIPPS: Flexible Model-based Policy Search Robust To The Curse Of Chaos
2019 Β· Paavo Parmas, Carl Edward Rasmussen, Jan Peters, et al.
Abstract
Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, w
Authors
(none)
Tags
Stats
Related papers
- Gradient-aware Model-based Policy Search (2019)6.77
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- Relative Entropy Pathwise Policy Optimization (2025)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- A Study Of Policy Gradient On A Class Of Exactly Solvable Models (2020)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00