Reward-agnostic Fine-tuning: Provable Statistical Benefits Of Hybrid Reinforcement Learning
2023 Β· Gen Li, Wenhao Zhan, Jason D. Lee, et al.
Abstract
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and model-based offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL -- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.
Authors
(none)
Tags
Stats
Related papers
- Hybrid RL: Using Both Offline And Online Data Can Make RL Efficient (2022)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Optimality Inductive Biases And Agnostic Guidelines For Offline Reinforcement Learning (2021)0.00
- Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data (2024)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency From Shifted-dynamics Data (2024)0.00