On Effective Scheduling Of Model-based Reinforcement Learning
2021 Β· Hang Lai, Jian Shen, Weinan Zhang, et al.
Abstract
Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistenc
Authors
(none)
Tags
Stats
Related papers
- How To Fine-tune The Model: Unified Model Shift And Model Bias Policy Optimization (2023)0.00
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- Enhancing Offline Model-based RL Via Active Model Selection: A Bayesian Optimization Perspective (2025)0.00
- A Model-based Approach For Sample-efficient Multi-task Reinforcement Learning (2019)0.00
- Sample-efficient Automated Deep Reinforcement Learning (2020)0.00
- B2MAPO: A Batch-by-batch Multi-agent Policy Optimization To Balance Performance And Efficiency (2024)0.00
- Double Horizon Model-based Policy Optimization (2025)0.00
- Conservative Dual Policy Optimization For Efficient Model-based Reinforcement Learning (2022)0.00