Decoupled Prioritized Resampling For Offline RL
2023 Β· Yang Yue, Bingyi Kang, Xiao Ma, et al.
Abstract
Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. To alleviate this issue, we propose Offline Prioritized Experience Replay (OPER), featuring a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training. Through theoretical analysis, we show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We develop two practical strategies to obtain priority weights by estimating advantages based on a fitted value network (OPER-A) or u
Authors
(none)
Tags
Stats
Related papers
- Prioritized Trajectory Replay: A Replay Memory For Data-driven Reinforcement Learning (2023)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- A2PO: Towards Effective Offline Reinforcement Learning From An Advantage-aware Perspective (2024)1.69
- An Optimistic Perspective On Offline Reinforcement Learning (2019)0.00