One Step At A Time: Pros And Cons Of Multi-step Meta-gradient Reinforcement Learning
2021 · Clément Bonnet, Paul Caron, Thomas Barrett, et al.
Abstract
Self-tuning algorithms that adapt the learning process online encourage more effective and robust learning. Among all the methods available, meta-gradients have emerged as a promising approach. They leverage the differentiability of the learning rule with respect to some hyper-parameters to adapt them in an online fashion. Although meta-gradients can be accumulated over multiple learning steps to avoid myopic updates, this is rarely used in practice. In this work, we demonstrate that whilst multi-step meta-gradients do provide a better learning signal in expectation, this comes at the cost of a significant increase in variance, hindering performance. In the light of this analysis, we introduce a novel method mixing multiple inner steps that enjoys a more accurate and robust meta-gradient signal, essentially trading off bias and variance in meta-gradient estimation. When applied to the Snake game, the mixing meta-gradient algorithm can cut the variance by a factor of 3 while achieving s
Authors
(none)
Tags
Stats
Related papers
- Metatrace Actor-critic: Online Step-size Tuning By Meta-gradient Descent For Reinforcement Learning Control (2018)0.00
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- On The Effectiveness Of Fine-tuning Versus Meta-reinforcement Learning (2022)0.00
- Debiasing Meta-gradient Reinforcement Learning By Learning The Outer Value Function (2022)0.00
- Meta-value Learning: A General Framework For Learning With Learning Awareness (2023)0.00
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Stepsize Learning For Policy Gradient Methods In Contextual Markov Decision Processes (2023)2.26
- Biased Gradient Estimate With Drastic Variance Reduction For Meta Reinforcement Learning (2021)0.00