Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization
2022 Β· Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, et al.
Abstract
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties,
Authors
(none)
Tags
Stats
Related papers
- An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes (2021)0.00
- COMBO: Conservative Offline Model-based Policy Optimization (2021)0.00
- Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization (2020)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Enhancing Offline Model-based RL Via Active Model Selection: A Bayesian Optimization Perspective (2025)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- Long-horizon Model-based Offline Reinforcement Learning Without Conservatism (2025)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00