Constrained Latent Action Policies For Model-based Offline Reinforcement Learning
2024 Β· Marvin Alles, Philip Becker-Ehmck, Patrick van Der Smagt, et al.
Abstract
In offline reinforcement learning, a policy is learned using a static dataset in the absence of costly feedback from the environment. In contrast to the online setting, only using static datasets poses additional challenges, such as policies generating out-of-distribution samples. Model-based offline reinforcement learning methods try to overcome these by learning a model of the underlying dynamics of the environment and using it to guide policy search. It is beneficial but, with limited datasets, errors in the model and the issue of value overestimation among out-of-distribution states can worsen performance. Current model-based methods apply some notion of conservatism to the Bellman update, often implemented using uncertainty estimation derived from model ensembles. In this paper, we propose Constrained Latent Action Policies (C-LAP) which learns a generative model of the joint distribution of observations and actions. We cast policy learning as a constrained objective to always sta
Authors
(none)
Tags
Stats
Related papers
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- Hypercube Policy Regularization Framework For Offline Reinforcement Learning (2024)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- A2PO: Towards Effective Offline Reinforcement Learning From An Advantage-aware Perspective (2024)1.69
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- State-constrained Offline Reinforcement Learning (2024)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00