Policy Regularization With Dataset Constraint For Offline Reinforcement Learning
2023 Β· Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, et al.
Abstract
We consider the problem of learning the best possible policy from a fixed dataset, known as offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by distribution or support of the behavior policy. However, distribution and support constraints are overly conservative since they both force the policy to choose similar actions as the behavior policy when considering particular states. It will limit the learned policy's performance, especially when the behavior policy is sub-optimal. In this paper, we find that regularizing the policy towards the nearest state-action pair can be more effective and thus propose Policy Regularization with Dataset Constraint (PRDC). When updating the policy in a given state, PRDC searches the entire dataset for the nearest state-action sample and then restricts the policy with the action of this sample. Unlike previous works, PRDC can guide the policy with pr
Authors
(none)
Tags
Stats
Related papers
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Hypercube Policy Regularization Framework For Offline Reinforcement Learning (2024)0.00
- Adaptive Advantage-guided Policy Regularization For Offline Reinforcement Learning (2024)3.09
- Iteratively Refined Behavior Regularization For Offline Reinforcement Learning (2023)2.26
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00
- Policy Constraint By Only Support Constraint For Offline Reinforcement Learning (2025)0.00
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- A Dataset Perspective On Offline Reinforcement Learning (2021)0.00