Biased Gradient Estimate With Drastic Variance Reduction For Meta Reinforcement Learning
2021 Β· Yunhao Tang
Abstract
Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance \(\Theta(N)\) which linearly depends on the sample size \(N\) of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias \(\mathcal\{O\}(1/\sqrt\{N\})\) and variance \(\mathcal\{O\}(1/N)\); (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its con
Authors
(none)
Tags
Stats
Related papers
- Debiasing Meta-gradient Reinforcement Learning By Learning The Outer Value Function (2022)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- Promp: Proximal Meta-policy Search (2018)0.00
- Unifying Gradient Estimators For Meta-reinforcement Learning Via Off-policy Evaluation (2021)0.00
- Theoretical Analysis Of Meta Reinforcement Learning: Generalization Bounds And Convergence Guarantees (2024)10.35
- On The Convergence Theory Of Debiased Model-agnostic Meta-reinforcement Learning (2020)0.00
- One Step At A Time: Pros And Cons Of Multi-step Meta-gradient Reinforcement Learning (2021)0.00
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00