Offline-boosted Actor-critic: Adaptively Blending Optimal Historical Behaviors In Deep Off-policy RL
2024 Β· Yu Luo, Tianying Ji, Fuchun Sun, et al.
Abstract
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline RL policy based on the shared online replay buffer can sometimes outperform the original online learning policy, though the occurrence of such performance gains remains uncertain. This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance. Our experim
Authors
(none)
Tags
Stats
Related papers
- BRAC+: Improved Behavior Regularized Actor Critic For Offline Reinforcement Learning (2021)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Adaptive Replay Buffer For Offline-to-online Reinforcement Learning (2025)0.00
- POPO: Pessimistic Offline Policy Optimization (2020)5.24
- Efficient Offline Reinforcement Learning: First Imitate, Then Improve (2024)1.91
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- Optimization Solution Functions As Deterministic Policies For Offline Reinforcement Learning (2024)0.00