When To Trust Your Simulator: Dynamics-aware Hybrid Offline-and-online Reinforcement Learning
2022 Β· Haoyi Niu, Shubham Sharma, Yiwen Qiu, et al.
Abstract
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H
Authors
(none)
Tags
Stats
Related papers
- H2O+: An Improved Framework For Hybrid Offline-and-online RL With Dynamics Gaps (2023)0.00
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Hybrid RL: Using Both Offline And Online Data Can Make RL Efficient (2022)0.00
- Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency From Shifted-dynamics Data (2024)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- Persim: Data-efficient Offline Reinforcement Learning With Heterogeneous Agents Via Personalized Simulators (2021)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00