Bayes Adaptive Monte Carlo Tree Search For Offline Model-based Reinforcement Learning
2024 Β· Jiayu Chen, Le Xu, Wentse Chen, et al.
Abstract
Offline reinforcement learning (RL) is a powerful approach for data-driven decision-making and control. Compared to model-free methods, offline model-based reinforcement learning (MBRL) explicitly learns world models from a static dataset and uses them as surrogate simulators, improving the data efficiency and enabling the learned policy to potentially generalize beyond the dataset support. However, there could be various MDPs that behave identically on the offline dataset and dealing with the uncertainty about the true MDP can be challenging. In this paper, we propose modeling offline MBRL as a Bayes Adaptive Markov Decision Process (BAMDP), which is a principled framework for addressing model uncertainty. We further propose a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces with stochastic transitions. This planning process is based on Monte Carlo Tree Search and can be integrated into offline MBRL as a policy improve
Authors
(none)
Tags
Stats
Related papers
- Policy-driven World Model Adaptation For Robust Offline Model-based Reinforcement Learning (2025)0.00
- Enhancing Offline Model-based RL Via Active Model Selection: A Bayesian Optimization Perspective (2025)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes (2021)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Offline Meta Learning Of Exploration (2020)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58