TOM: Learning Policy-aware Models For Model-based Reinforcement Learning Via Transition Occupancy Matching
2023 Β· Yecheng Jason Ma, Kausik Sivakumar, Jason Yan, et al.
Abstract
Standard model-based reinforcement learning (MBRL) approaches fit a transition model of the environment to all past experience, but this wastes model capacity on data that is irrelevant for policy improvement. We instead propose a new "transition occupancy matching" (TOM) objective for MBRL model learning: a model is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment. We derive TOM directly from a novel lower bound on the standard reinforcement learning objective. To optimize TOM, we show how to reduce it to a form of importance weighted maximum-likelihood estimation, where the automatically computed importance weights identify policy-relevant past experiences from a replay buffer, enabling stable optimization. TOM thus offers a plug-and-play model learning sub-routine that is compatible with any backbone MBRL algorithm. On various Mujoco continuous robotic control tasks, we show that TOM successfully
Authors
(none)
Tags
Stats
Related papers
- Bayes-adaptive Deep Model-based Policy Optimisation (2020)0.00
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- Enhancing Offline Model-based RL Via Active Model Selection: A Bayesian Optimization Perspective (2025)0.00
- A Model-based Approach For Sample-efficient Multi-task Reinforcement Learning (2019)0.00
- Simplifying Model-based RL: Learning Representations, Latent-space Models, And Policies With One Objective (2022)0.00
- The Virtues Of Laziness In Model-based RL: A Unified Objective And Algorithms (2023)0.00
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- Mismatched No More: Joint Model-policy Optimization For Model-based RL (2021)0.00