Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization
2020 Β· Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, et al.
Abstract
Most reinforcement learning (RL) algorithms assume online access to the environment, in which one may readily interleave updates to the policy with experience collection using that policy. However, in many real-world applications such as health, education, dialogue agents, and robotics, the cost or potential risk of deploying a new data-collection policy is high, to the point that it can become prohibitive to update the data-collection policy more than a few times during learning. With this view, we propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning. We observe that na\"\{i\}vely applying existing model-free offline RL algorithms recursively does not lead to a practical deployment-efficient and sample-efficient algorithm. We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior
Authors
(none)
Tags
Stats
Related papers
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58
- Enhancing Offline Model-based RL Via Active Model Selection: A Bayesian Optimization Perspective (2025)0.00
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00
- Statistically Efficient Advantage Learning For Offline Reinforcement Learning In Infinite Horizons (2022)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00