Policy Constraint By Only Support Constraint For Offline Reinforcement Learning
2025 Β· Yunkai Gao, Jiaming Guo, Fan Wu, et al.
Abstract
Offline reinforcement learning (RL) aims to optimize a policy by using pre-collected datasets, to maximize cumulative rewards. However, offline reinforcement learning suffers challenges due to the distributional shift between the learned and behavior policies, leading to errors when computing Q-values for out-of-distribution (OOD) actions. To mitigate this issue, policy constraint methods aim to constrain the learned policy's distribution with the distribution of the behavior policy or confine action selection within the support of the behavior policy. However, current policy constraint methods tend to exhibit excessive conservatism, hindering the policy from further surpassing the behavior policy's performance. In this work, we present Only Support Constraint (OSC) which is derived from maximizing the total probability of learned policy in the support of behavior policy, to address the conservatism of policy constraint. OSC presents a regularization term that only restricts policies t
Authors
(none)
Tags
Stats
Related papers
- Policy Regularization With Dataset Constraint For Offline Reinforcement Learning (2023)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- State-constrained Offline Reinforcement Learning (2024)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00
- Hypercube Policy Regularization Framework For Offline Reinforcement Learning (2024)0.00