SAMG: Offline-to-online Reinforcement Learning Via State-action-conditional Offline Model Guidance
2024 Β· Liyu Zhang, Haochi Wu, Xu Wan, et al.
Abstract
Offline-to-online (O2O) reinforcement learning (RL) pre-trains models on offline data and refines policies through online fine-tuning. However, existing O2O RL algorithms typically require maintaining the tedious offline datasets to mitigate the effects of out-of-distribution (OOD) data, which significantly limits their efficiency in exploiting online samples. To address this deficiency, we introduce a new paradigm for O2O RL called State-Action-Conditional Offline \Model Guidance (SAMG). It freezes the pre-trained offline critic to provide compact offline understanding for each state-action sample, thus eliminating the need for retraining on offline data. The frozen offline critic is incorporated with the online target critic weighted by a state-action-adaptive coefficient. This coefficient aims to capture the offline degree of samples at the state-action level, and is updated adaptively during training. In practice, SAMG could be easily integrated with Q-function-based algorithms. Th
Authors
(none)
Tags
Stats
Related papers
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- Towards Robust Offline-to-online Reinforcement Learning Via Uncertainty And Smoothness (2023)5.24
- Offline-to-online Multi-agent Reinforcement Learning With Offline Value Function Memory And Sequential Exploration (2024)2.26
- Beyond OOD State Actions: Supported Cross-domain Offline Reinforcement Learning (2023)0.00
- Train Once, Get A Family: State-adaptive Balances For Offline-to-online Reinforcement Learning (2023)3.25
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00