Mitigating Distribution Shift In Model-based Offline RL Via Shifts-aware Reward Learning
2024 Β· Wang Luo, Haoran Li, Zicheng Zhang, et al.
Abstract
Model-based offline reinforcement learning trains policies using pre-collected datasets and learned environment models, eliminating the need for direct real-world environment interaction. However, this paradigm is inherently challenged by distribution shift~(DS). Existing methods address this issue by leveraging off-policy mechanisms and estimating model uncertainty, but they often result in inconsistent objectives and lack a unified theoretical foundation. This paper offers a comprehensive analysis that disentangles the problem into two fundamental components: model bias and policy shift. Our theoretical and empirical investigations reveal how these factors distort value estimation and restrict policy optimization. To tackle these challenges, we derive a novel shifts-aware reward through a unified probabilistic inference framework, which modifies the vanilla reward to refine value learning and facilitate policy training. Building on this, we develop a practical implementation that lev
Authors
(none)
Tags
Stats
Related papers
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Comadice: Offline Cooperative Multi-agent Reinforcement Learning With Stationary Distribution Shift Regularization (2024)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- DOMAIN: Mildly Conservative Model-based Offline Reinforcement Learning (2023)0.00
- Assessing The Impact Of Distribution Shift On Reinforcement Learning Performance (2024)0.00
- Optidice: Offline Policy Optimization Via Stationary Distribution Correction Estimation (2021)0.00
- Distributionally Adaptive Meta Reinforcement Learning (2022)2.26