Analysis Of Hyper-parameters For Small Games: Iterations Or Epochs In Self-play?
2020 Β· Hui Wang, Michael Emmerich, Mike Preuss, et al.
Abstract
The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. We use small games, to achieve meaningful exploration with moderate computational effort. The experimental results show that training is highly sensitive to hyper-parameter choices. Through multi-objective analysis we identify 4 important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower pe
Authors
(none)
Tags
Stats
Related papers
- Targeted Search Control In Alphazero For Effective Policy Improvement (2023)0.00
- Impartial Games: A Challenge For Reinforcement Learning (2022)0.00
- Adaptable Hindsight Experience Replay For Search-based Learning (2025)0.00
- Efficient Competitive Self-play Policy Optimization (2020)0.00
- Combining Off And On-policy Training In Model-based Reinforcement Learning (2021)0.00
- Score Vs. Winrate In Score-based Games: Which Reward For Reinforcement Learning? (2022)7.16
- Scaling Laws For A Multi-agent Reinforcement Learning Model (2022)0.00
- Policy-value Alignment And Robustness In Search-based Multi-agent Learning (2023)0.00