Leveraging Offline Data In Online Reinforcement Learning
2022 Β· Andrew Wagenmaker, Aldo Pacchiano
Abstract
Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the environment, and must interact with it in order to find an \(\epsilon\)-optimal policy. In the offline RL setting, the learner instead has access to a fixed dataset to learn from, but is unable to otherwise interact with the environment, and must obtain the best policy it can from this offline data. Practical scenarios often motivate an intermediate setting: if we have some set of offline data and, in addition, may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an \(\epsilon\)-optimal policy? In this work, we consider this setting, which we call the \textsf\{FineTuneRL\} setting, for MDPs with linear structure. We characterize the necessary number of online samples needed in this setting given access to some offline dataset,
Authors
(none)
Tags
Stats
Related papers
- Efficient Online Reinforcement Learning Fine-tuning Need Not Retain Offline Data (2024)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Reward-agnostic Fine-tuning: Provable Statistical Benefits Of Hybrid Reinforcement Learning (2023)0.00
- Policy Finetuning: Bridging Sample-efficient Offline And Online Reinforcement Learning (2021)0.00
- The Three Regimes Of Offline-to-online Reinforcement Learning (2025)0.00
- Optimistic Critic Reconstruction And Constrained Fine-tuning For General Offline-to-online RL (2024)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00