Autoregressive Dynamics Models For Offline Policy Evaluation And Optimization
2021 Β· Michael R. Zhang, Tom Le Paine, Ofir Nachum, et al.
Abstract
Standard dynamics models for continuous control make use of feedforward computation to predict the conditional distribution of next state and reward given current state and action using a multivariate Gaussian with a diagonal covariance structure. This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics. In this paper, we challenge this conditional independence assumption and propose a family of expressive autoregressive dynamics models that generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We demonstrate that autoregressive dynamics models indeed outperform standard feedforward models in log-likelihood on heldout transitions. Furthermore, we compare different model-based and model-free off-policy evaluation (
Authors
(none)
Tags
Stats
Related papers
- MOBODY: Model Based Off-dynamics Offline Reinforcement Learning (2025)0.00
- Any-step Dynamics Model Improves Future Predictions For Online And Offline Reinforcement Learning (2024)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Model-based Offline Reinforcement Learning With Pessimism-modulated Dynamics Belief (2022)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Long-horizon Rollout Via Dynamics Diffusion For Offline Reinforcement Learning (2024)1.81
- Live In The Moment: Learning Dynamics Model Adapted To Evolving Policy (2022)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00