A Kl-regularization Framework For Learning To Plan With Adaptive Priors
2025 Β· Γlvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, et al.
Abstract
Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and entropy regularization. However, because the states encountered during training depend on the MPPI planner, aligning the sampling policy with the planner improves the accuracy of value estimation and long-term performance. To this end, recent methods update the sampling policy by minimizing KL divergence to the planner distribution or by introducing planner-guided regularization into the policy update. In this work, we unify these MPPI-based reinforcement learning methods under a single framewo
Authors
(none)
Tags
Stats
Related papers
- Plan To Predict: Learning An Uncertainty-foreseeing Model For Model-based Reinforcement Learning (2023)0.00
- Policy-aware Model Learning For Policy Gradient Methods (2020)0.00
- Exploiting Hierarchy For Learning And Transfer In Kl-regularized RL (2019)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Theoretically Guaranteed Policy Improvement Distilled From Model-based Planning (2023)2.26
- Coplanner: Plan To Roll Out Conservatively But To Explore Optimistically For Model-based RL (2023)0.00
- Learning Adaptive Exploration Strategies In Dynamic Environments Through Informed Policy Regularization (2020)0.00
- A Regularized Approach To Sparse Optimal Policy In Reinforcement Learning (2019)0.00