Discovering Temporally-aware Reinforcement Learning Algorithms
2024 Β· Matthew Thomas Jackson, Chris Lu, Louis Kirsch, et al.
Abstract
Recent advancements in meta-learning have enabled the automatic discovery of novel reinforcement learning algorithms parameterized by surrogate objective functions. To improve upon manually designed algorithms, the parameterization of this learned objective function must be expressive enough to represent novel principles of learning (instead of merely recovering already established ones) while still generalizing to a wide range of settings outside of its meta-training distribution. However, existing methods focus on discovering objective functions that, like many widely used objective functions in reinforcement learning, do not take into account the total number of steps allowed for training, or "training horizon". In contrast, humans use a plethora of different learning objectives across the course of acquiring a new ability. For instance, students may alter their studying techniques based on the proximity to exam deadlines and their self-assessed capabilities. This paper contends tha
Authors
(none)
Tags
Stats
Related papers
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- Discovering Reinforcement Learning Algorithms (2020)0.00
- Unsupervised Meta-learning For Reinforcement Learning (2018)0.00
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Discovering General Reinforcement Learning Algorithms With Adversarial Environment Design (2023)0.00
- Evolving Reinforcement Learning Algorithms (2021)0.00
- Reinforcement Teaching (2022)0.00
- Online Meta-learning By Parallel Algorithm Competition (2017)8.35