Promp: Proximal Meta-policy Search
2018 Β· Jonas Rothfuss, Dennis Lee, Ignasi Clavera, et al.
Abstract
Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.
Authors
(none)
Tags
Stats
Related papers
- Meta-reinforcement Learning With Universal Policy Adaptation: Provable Near-optimality Under All-task Optimum Comparator (2024)0.00
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00
- Guided Meta-policy Search (2019)0.00
- Assigning Credit With Partial Reward Decoupling In Multi-agent Proximal Policy Optimization (2024)0.00
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00
- Efficient Meta Reinforcement Learning For Preference-based Fast Adaptation (2022)0.00
- Meta-q-learning (2019)3.58
- A Tutorial On Meta-reinforcement Learning (2023)10.85