Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy

Abstract

Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for *all historical policies* does not necessarily benefit model prediction for the *current policy* since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit\{Policy-adapted Dynamics Model Learning (PDML)\}. PDML dynamically adjusts the historical policy mixture distribution

Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy

Abstract

Authors

Tags

Stats

Related papers