Revisiting Design Choices In Offline Model-based Reinforcement Learning
2021 Β· Cong Lu, Philip J. Ball, Jack Parker-Holder, et al.
Abstract
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or unsafe online data collection. Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model. This typically involves constructing a probabilistic model, and using the model uncertainty to penalize rewards where there is insufficient data, solving for a pessimistic MDP that lower bounds the true MDP. Existing methods, however, exhibit a breakdown between theory and practice, whereby pessimistic return ought to be bounded by the total variation distance of the model from the true dynamics, but is instead implemented through a penalty based on estimated model uncertainty. This has spawned a variety of uncertainty heuristics, with little to no comparison between differing approaches. In this paper, we compare these heuri
Authors
(none)
Tags
Stats
Related papers
- Model-based Offline Reinforcement Learning With Pessimism-modulated Dynamics Belief (2022)0.00
- An Offline Risk-aware Policy Selection Method For Bayesian Markov Decision Processes (2021)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Offline Vs. Online Learning In Model-based RL: Lessons For Data Collection Strategies (2025)0.00
- Long-horizon Model-based Offline Reinforcement Learning Without Conservatism (2025)0.00