RL\(^3\): Boosting Meta Reinforcement Learning Via RL Inside RL\(^2\)
2023 Β· Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein
Abstract
Meta reinforcement learning (Meta-RL) methods such as RL\(^2\) have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but do converge to an optimal policy in the limit. We propose RL\(^3\), a principled hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to Meta-RL. We show that RL\(^3\) earns a greater cumulative reward in the long term compared to RL\(^2\) while drastically reducing meta-training time and generalizes better to out-of-distribution tasks. Experiments are conducted on Meta-RL benc
Authors
(none)
Tags
Stats
Related papers
- A Tutorial On Meta-reinforcement Learning (2023)10.85
- Learning To Reinforcement Learn (2016)0.00
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- HMRL: Hyper-meta Learning For Sparse Reward Reinforcement Learning Problem (2020)0.00
- Black Box Meta-learning Intrinsic Rewards (2024)0.00
- Guided Meta-policy Search (2019)0.00
- Meta-q-learning (2019)3.58
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00