Reward-consistent Dynamics Models Are Strongly Generalizable For Offline Reinforcement Learning
2023 Β· Fan-Ming Luo, Tian Xu, Xingchen Cao, et al.
Abstract
Learning a precise dynamics model can be crucial for offline reinforcement learning, which, unfortunately, has been found to be quite challenging. Dynamics models that are learned by fitting historical transitions often struggle to generalize to unseen transitions. In this study, we identify a hidden but pivotal factor termed dynamics reward that remains consistent across transitions, offering a pathway to better generalization. Therefore, we propose the idea of reward-consistent dynamics models: any trajectory generated by the dynamics model should maximize the dynamics reward derived from the data. We implement this idea as the MOREC (Model-based Offline reinforcement learning with Reward Consistency) method, which can be seamlessly integrated into previous offline model-based reinforcement learning (MBRL) methods. MOREC learns a generalizable dynamics reward function from offline data, which is subsequently employed as a transition filter in any offline MBRL method: when generating
Authors
(none)
Tags
Stats
Related papers
- MOBODY: Model Based Off-dynamics Offline Reinforcement Learning (2025)0.00
- DARA: Dynamics-aware Reward Augmentation In Offline Reinforcement Learning (2022)0.00
- Trajectory-wise Multiple Choice Learning For Dynamics Generalization In Reinforcement Learning (2020)6.77
- Autoregressive Dynamics Models For Offline Policy Evaluation And Optimization (2021)0.00
- Behavioral Priors And Dynamics Models: Improving Performance And Domain Transfer In Offline RL (2021)0.00
- Model-based Offline Reinforcement Learning With Pessimism-modulated Dynamics Belief (2022)0.00
- Any-step Dynamics Model Improves Future Predictions For Online And Offline Reinforcement Learning (2024)0.00
- Model-based Offline Reinforcement Learning With Reliability-guaranteed Sequence Modeling (2025)0.00