Theoretically Guaranteed Policy Improvement Distilled From Model-based Planning
2023 Β· Chuming Li, Ruonan Jia, Jie Liu, et al.
Abstract
Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency. To save the computation cost of conducting planning online, recent practices tend to distill optimized action sequences into an RL policy during the training phase. Although the distillation can incorporate both the foresight of planning and the exploration ability of RL policies, the theoretical understanding of these methods is yet unclear. In this paper, we extend the policy improvement step of Soft Actor-Critic (SAC) by developing an approach to distill from model-based planning to the policy. We then demonstrate that such an approach of policy improvement has a theoretical guarantee of monotonic improvement and convergence to the maximum value defined in SAC. We discuss effective design choices and implement our theory as a practical algorithm -- Model-based Planning Distilled to Policy (MPDP) -- that updates the policy jointly ove
Authors
(none)
Tags
Stats
Related papers
- Policy Improvement Reinforcement Learning (2026)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Generalized Policy Improvement Algorithms With Theoretically Supported Sample Reuse (2022)5.24
- Algorithmic Framework For Model-based Deep Reinforcement Learning With Theoretical Guarantees (2018)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00
- When To Update Your Model: Constrained Model-based Reinforcement Learning (2022)2.26