Discovering General Reinforcement Learning Algorithms With Adversarial Environment Design
2023 Β· Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, et al.
Abstract
The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment
Authors
(none)
Tags
Stats
Related papers
- Discovering Reinforcement Learning Algorithms (2020)0.00
- Replay-guided Adversarial Environment Design (2021)0.00
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- Discovering Minimal Reinforcement Learning Environments (2024)0.00
- Boosting Exploration In Multi-task Reinforcement Learning Using Adversarial Networks (2022)0.00
- Policy Gradient RL Algorithms As Directed Acyclic Graphs (2020)0.00
- Improving Generalization To New Environments And Removing Catastrophic Forgetting In Reinforcement Learning By Using An Eco-system Of Agents (2022)0.00