Policy Gradient Search: Online Planning And Expert Iteration Without Search Trees
2019 Β· Thomas Anthony, Robert Nishihara, Philipp Moritz, et al.
Abstract
Monte Carlo Tree Search (MCTS) algorithms perform simulation-based search to improve policies online. During search, the simulation policy is adapted to explore the most promising lines of play. MCTS has been used by state-of-the-art programs for many problems, however a disadvantage to MCTS is that it estimates the values of states with Monte Carlo averages, stored in a search tree; this does not scale to games with very high branching factors. We propose an alternative simulation-based search method, Policy Gradient Search (PGS), which adapts a neural network simulation policy online via policy gradient updates, avoiding the need for a search tree. In Hex, PGS achieves comparable performance to MCTS, and an agent trained using Expert Iteration with PGS was able defeat MoHex 2.0, the strongest open-source Hex agent, in 9x9 Hex.
Authors
(none)
Tags
Stats
Related papers
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- Softtreemax: Policy Gradient With Tree Search (2022)0.00
- Learning Policies From Self-play With Policy Gradients And MCTS Value Estimates (2019)0.00
- Multiple Policy Value Monte Carlo Tree Search (2019)0.00
- Decision Making In Non-stationary Environments With Policy-augmented Search (2024)0.00
- Decision Making In Non-stationary Environments With Policy-augmented Monte Carlo Tree Search (2022)0.00
- Variance-aware Prior-based Tree Policies For Monte Carlo Tree Search (2026)0.00
- Gradient-aware Model-based Policy Search (2019)6.77