Meta-model-based Meta-policy Optimization
2020 Β· Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, et al.
Abstract
Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.
Authors
(none)
Tags
Stats
Related papers
- Meta-reinforcement Learning With Universal Policy Adaptation: Provable Near-optimality Under All-task Optimum Comparator (2024)0.00
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00
- RL\(^3\): Boosting Meta Reinforcement Learning Via RL Inside RL\(^2\) (2023)0.00
- Algorithmic Framework For Model-based Deep Reinforcement Learning With Theoretical Guarantees (2018)0.00
- Meta-q-learning (2019)3.58
- Mismatched No More: Joint Model-policy Optimization For Model-based RL (2021)0.00
- Double Meta-learning For Data Efficient Policy Optimization In Non-stationary Environments (2020)0.00
- Guided Meta-policy Search (2019)0.00