Is Deep Reinforcement Learning Really Superhuman On Atari? Leveling The Playing Field
2019 Β· Marin Toromanoff, Emilie Wirbel, Fabien Moutarde
Abstract
Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be accurate. Finally, we propose Rainbow-IQN by extending Rainbow with Implicit Quantile Networks (IQN) leading
Authors
(none)
Tags
Stats
Related papers
- A Review For Deep Reinforcement Learning In Atari:benchmarks, Challenges, And Solutions (2021)0.00
- Deep Reinforcement Learning At The Edge Of The Statistical Precipice (2021)0.00
- Importance Of Using Appropriate Baselines For Evaluation Of Data-efficiency In Deep Reinforcement Learning For Atari (2020)0.00
- A Human Mixed Strategy Approach To Deep Reinforcement Learning (2018)7.50
- Revisiting Rainbow: Promoting More Insightful And Inclusive Deep Reinforcement Learning Research (2020)0.00
- Revisiting The Arcade Learning Environment: Evaluation Protocols And Open Problems For General Agents (2017)15.67
- Learn To Interpret Atari Agents (2018)0.00
- Reward Learning From Human Preferences And Demonstrations In Atari (2018)0.00