Offline Meta-reinforcement Learning With Online Self-supervision
2021 Β· Vitchyr H. Pong, Ashvin Nair, Laura Smith, et al.
Abstract
Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a
Authors
(none)
Tags
Stats
Related papers
- Offline Meta-reinforcement Learning With Advantage Weighting (2020)0.00
- Offline Meta Learning Of Exploration (2020)0.00
- Distributionally Adaptive Meta Reinforcement Learning (2022)2.26
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56
- A Tutorial On Meta-reinforcement Learning (2023)10.85
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Entropy Regularized Task Representation Learning For Offline Meta-reinforcement Learning (2024)0.00
- Guided Meta-policy Search (2019)0.00