A Behavior Regularized Implicit Policy For Offline Reinforcement Learning
2022 Β· Shentao Yang, Zhendong Wang, Huangjie Zheng, et al.
Abstract
Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiment
Authors
(none)
Tags
Stats
Related papers
- Hypercube Policy Regularization Framework For Offline Reinforcement Learning (2024)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Iteratively Refined Behavior Regularization For Offline Reinforcement Learning (2023)2.26
- Adaptive Advantage-guided Policy Regularization For Offline Reinforcement Learning (2024)3.09
- Constrained Policy Optimization With Explicit Behavior Density For Offline Reinforcement Learning (2023)0.00
- Policy Regularization With Dataset Constraint For Offline Reinforcement Learning (2023)0.00
- A2PO: Towards Effective Offline Reinforcement Learning From An Advantage-aware Perspective (2024)1.69