Multiple Policy Value Monte Carlo Tree Search
2019 Β· Li-Cheng Lan, Wei Li, Ting-Han Wei, et al.
Abstract
Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms singl
Authors
(none)
Tags
Stats
Related papers
- Combining Off And On-policy Training In Model-based Reinforcement Learning (2021)0.00
- Learning Policies From Self-play With Policy Gradients And MCTS Value Estimates (2019)0.00
- Policy Gradient Search: Online Planning And Expert Iteration Without Search Trees (2019)0.00
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- Variance-aware Prior-based Tree Policies For Monte Carlo Tree Search (2026)0.00
- Decision Making In Non-stationary Environments With Policy-augmented Monte Carlo Tree Search (2022)0.00
- Ordinal Monte Carlo Tree Search (2019)0.00
- Policy-value Alignment And Robustness In Search-based Multi-agent Learning (2023)0.00