Ordinal Monte Carlo Tree Search
2019 · Tobias Joppen, Johannes Fürnkranz
Abstract
In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals 1 and losing equals -1, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings, such as setting the value of a loss to -0:5, which is often done in practice to encourage learning. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values are not available, a numerical reward signal is necessarily biased. In this paper, we take a look at Monte Carlo Tree Search (MCTS), a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of
Authors
(none)
Tags
Stats
Related papers
- Decision Making In Non-stationary Environments With Policy-augmented Monte Carlo Tree Search (2022)0.00
- Risk Aware And Multi-objective Decision Making With Distributional Monte Carlo Tree Search (2021)0.00
- Variance-aware Prior-based Tree Policies For Monte Carlo Tree Search (2026)0.00
- Decision Making In Non-stationary Environments With Policy-augmented Search (2024)0.00
- Convex Regularization In Monte-carlo Tree Search (2020)0.00
- Multiple Policy Value Monte Carlo Tree Search (2019)0.00
- Nonzero: Interaction-guided Exploration For Multi-agent Monte Carlo Tree Search (2026)0.00
- Know Your Enemy: Investigating Monte-carlo Tree Search With Opponent Models In Pommerman (2023)0.00