Go-explore: A New Approach For Hard-exploration Problems
2019 Β· Adrien Ecoffet, Joost Huizinga, Joel Lehman, et al.
Abstract
A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following principles: (1) remember previously visited states, (2) first return to a promising state (without exploration), then explore from it, and (3) solve simulated environments through any available means (including by introducing determinism), then robustify via imitation learning. The combined effect of these principles is a dramatic performance improvement on hard-exploration problems. On Montezuma's Revenge, Go-Explore scores a mean of over 43k points, almost 4 times the previous state of the art.
Authors
(none)
Tags
Stats
Related papers
- First Go, Then Post-explore: The Benefits Of Post-exploration In Intrinsic Motivation (2022)0.00
- Explore-go: Leveraging Exploration For Generalisation In Deep Reinforcement Learning (2024)0.00
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- Exploration In Feature Space For Reinforcement Learning (2017)0.00
- Gan-based Intrinsic Exploration For Sample Efficient Reinforcement Learning (2022)2.26
- On Hard Exploration For Reinforcement Learning: A Case Study In Pommerman (2019)5.24
- Generative Adversarial Exploration For Reinforcement Learning (2022)0.00
- Redeeming Intrinsic Rewards Via Constrained Optimization (2022)0.00