On The Convergence Theory Of Debiased Model-agnostic Meta-reinforcement Learning
2020 Β· Alireza Fallah, Kristian Georgiev, Aryan Mokhtari, et al.
Abstract
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an \(\epsilon\)-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient
Authors
(none)
Tags
Stats
Related papers
- Alpha MAML: Adaptive Model-agnostic Meta-learning (2019)0.00
- Non-asymptotic Convergence Of Adam-type Reinforcement Learning Algorithms Under Markovian Sampling (2020)0.00
- Convergence Of A L2 Regularized Policy Gradient Algorithm For The Multi Armed Bandit (2024)0.00
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00
- Biased Gradient Estimate With Drastic Variance Reduction For Meta Reinforcement Learning (2021)0.00
- A Decentralized Policy Gradient Approach To Multi-task Reinforcement Learning (2020)0.00
- Theoretical Analysis Of Meta Reinforcement Learning: Generalization Bounds And Convergence Guarantees (2024)10.35
- Policy-aware Model Learning For Policy Gradient Methods (2020)0.00