First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs
2023 Β· Ben Norman, Jeff Clune
Abstract
Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e. taking into account complex domain priors and adapting quickly based on previous exploration). Across episodes, RL agents struggle to perform even simple exploration strategies, for example systematic search that avoids exploring the same location multiple times. This poor exploration limits performance on challenging domains. Meta-RL is a potential solution, as unlike standard RL, meta-RL can learn to explore, and potentially learn highly complex strategies far beyond those of standard RL, strategies such as experimenting in early episodes to learn new skills, or conducting experiments to learn about the current environment. Traditional meta-RL focuses on the problem of learning to optimally balance exploration and exploitation to maximize the cumulative reward of the episode sequence (e.g., aiming to maximize the total wins in a tournament -- while also improving as a player). We identify a new
Authors
(none)
Tags
Stats
Related papers
- Decoupling Exploration And Exploitation For Meta-reinforcement Learning Without Sacrifices (2020)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- Boosting Exploration In Multi-task Reinforcement Learning Using Adversarial Networks (2022)0.00
- Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025)0.00
- Improving Generalization In Meta Reinforcement Learning Using Learned Objectives (2019)0.00
- Go-explore: A New Approach For Hard-exploration Problems (2019)0.00
- Offline Meta Learning Of Exploration (2020)0.00
- MULEX: Disentangling Exploitation From Exploration In Deep RL (2019)0.00