Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy
2022 Β· Xiyao Wang, Wichayaporn Wongkamjan, Furong Huang
Abstract
Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for *all historical policies* does not necessarily benefit model prediction for the *current policy* since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit\{Policy-adapted Dynamics Model Learning (PDML)\}. PDML dynamically adjusts the historical policy mixture distribution
Authors
(none)
Tags
Stats
Related papers
- Model-based Offline Reinforcement Learning With Pessimism-modulated Dynamics Belief (2022)0.00
- Robust Adversarial Policy Optimization Under Dynamics Uncertainty (2026)0.00
- State Regularized Policy Optimization On Data With Dynamics Shift (2023)0.00
- Deep Reinforcement Learning In A Handful Of Trials Using Probabilistic Dynamics Models (2018)0.00
- Autoregressive Dynamics Models For Offline Policy Evaluation And Optimization (2021)0.00
- Policy Learning For Off-dynamics RL With Deficient Support (2024)0.00
- PC-MLP: Model-based Reinforcement Learning With Policy Cover Guided Exploration (2021)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00