Iteratively Refined Behavior Regularization For Offline Reinforcement Learning
2023 Β· Xiaohan Hu, Yi Ma, Chenjun Xiao, et al.
Abstract
One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution. Whether the data originates from a near-optimal policy or not, we anticipate that an algorithm should demonstrate its ability to learn an effective control policy that seamlessly aligns with the inherent distribution of offline data. Unfortunately, behavior regularization, a simple yet effective offline RL algorithm, tends to struggle in this regard. In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. Our key observation is that by iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement, while also implicitly avoiding querying out-of-sample actions to prevent catastrophic learning failures. We prove that in the tabular setting this algorithm is capable of learning the optimal policy covered by the offline data
Authors
(none)
Tags
Stats
Related papers
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- BRAC+: Improved Behavior Regularized Actor Critic For Offline Reinforcement Learning (2021)0.00
- Policy Regularization With Dataset Constraint For Offline Reinforcement Learning (2023)0.00
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization (2020)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00