Meta-gradient Reinforcement Learning With An Objective Discovered Online
2020 Β· Zhongwen Xu, Hado van Hasselt, Matteo Hessel, et al.
Abstract
Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median
Authors
(none)
Tags
Stats
Related papers
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Enhancing Online Reinforcement Learning With Meta-learned Objective From Offline Data (2025)0.00
- Learning To Reinforcement Learn (2016)0.00
- Discovering General Reinforcement Learning Algorithms With Adversarial Environment Design (2023)0.00
- A Tutorial On Meta-reinforcement Learning (2023)10.85
- Metatrace Actor-critic: Online Step-size Tuning By Meta-gradient Descent For Reinforcement Learning Control (2018)0.00
- Deep Online Learning Via Meta-learning: Continual Adaptation For Model-based RL (2018)0.00
- Discovering Reinforcement Learning Algorithms (2020)0.00