Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency From Shifted-dynamics Data
2024 Β· Chengrui Qu, Laixi Shi, Kishan Panaganti, et al.
Abstract
Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our
Authors
(none)
Tags
Stats
Related papers
- When To Trust Your Simulator: Dynamics-aware Hybrid Offline-and-online Reinforcement Learning (2022)2.26
- Hybrid RL: Using Both Offline And Online Data Can Make RL Efficient (2022)0.00
- Reward-agnostic Fine-tuning: Provable Statistical Benefits Of Hybrid Reinforcement Learning (2023)0.00
- H2O+: An Improved Framework For Hybrid Offline-and-online RL With Dynamics Gaps (2023)0.00
- Can RLHF Be More Efficient With Imperfect Reward Models? A Policy Coverage Perspective (2025)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- State Regularized Policy Optimization On Data With Dynamics Shift (2023)0.00
- Human-inspired Framework To Accelerate Reinforcement Learning (2023)0.00