Online Meta-learning By Parallel Algorithm Competition
2017 Β· Stefan Elfwing, Eiji Uchibe, Kenji Doya
Abstract
The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulates the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question in reinforcement learning, which arguably has become more of an issue recently with the success of deep reinforcement learning in high-dimensional state spaces. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. We propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several instances of a reinforcement learning algorithm are run in parallel with small differences in the initial values of the meta-parameters. After a fixed number of episodes, the instances are selected based on their performance in the task at hand. Before continuing the learning, Gaussian noise is added to the meta-para
Authors
(none)
Tags
Stats
Related papers
- Meta-gradient Reinforcement Learning With An Objective Discovered Online (2020)0.00
- Discovering Temporally-aware Reinforcement Learning Algorithms (2024)0.00
- One Step At A Time: Pros And Cons Of Multi-step Meta-gradient Reinforcement Learning (2021)0.00
- Adapting Behaviour For Learning Progress (2019)0.00
- Neural Auto-curricula (2021)0.00
- Massively Multiagent Minigames For Training Generalist Agents (2024)0.00
- Unsupervised Meta-learning For Reinforcement Learning (2018)0.00
- Learning Large-scale Competitive Team Behaviors With Mean-field Interactions And Online Opponent Modeling (2025)0.00