Regret-guided Search Control For Efficient Learning In Alphazero
2026 Β· Yun-Jui Tsai, Wei-Yu Chen, Yan-Ru Ju, et al.
Abstract
Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensive self-play games to extract useful signals, humans often need only a few games, improving rapidly by repeatedly revisiting states where mistakes occurred. This idea, known as search control, aims to restart from valuable states rather than always from the initial state. In AlphaZero, prior work Go-Exploit applies this idea by sampling past states from self-play or search trees, but it treats all states equally, regardless of their learning potential. We propose Regret-Guided Search Control (RGSC), which extends AlphaZero with a regret network that learns to identify high-regret states, where the agent's evaluation diverges most from the actual outcome. These states are collected from both self-play trajectories and MCTS nodes, stored in a prioritized regret buffer, and reused as new starting positions. Across 9x9 Go, 10x10 Othello, and 1
Authors
(none)
Tags
Stats
Related papers
- Targeted Search Control In Alphazero For Effective Policy Improvement (2023)0.00
- Adaptable Hindsight Experience Replay For Search-based Learning (2025)0.00
- Modeling Strong And Human-like Gameplay With Kl-regularized Search (2021)0.00
- Impartial Games: A Challenge For Reinforcement Learning (2022)0.00
- Go-explore: A New Approach For Hard-exploration Problems (2019)0.00
- Combining Deep Reinforcement Learning And Search For Imperfect-information Games (2020)0.00
- Stepscorer: Accelerating Reinforcement Learning With Step-wise Scoring And Psychological Regret Modeling (2026)0.00
- Reinforcement Learning In Strategy-based And Atari Games: A Review Of Google Deepminds Innovations (2025)0.00