Adaptable Hindsight Experience Replay For Search-based Learning
2025 Β· Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, et al.
Abstract
AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours\{\}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
Authors
(none)
Tags
Stats
Related papers
- Regret-guided Search Control For Efficient Learning In Alphazero (2026)0.00
- Bias-reduced Hindsight Experience Replay With Virtual Goal Prioritization (2019)9.41
- Agenther: Hindsight Experience Replay For LLM Agent Trajectory Relabeling (2026)0.00
- Combining Off And On-policy Training In Model-based Reinforcement Learning (2021)0.00
- Targeted Search Control In Alphazero For Effective Policy Improvement (2023)0.00
- Hindsight Experience Replay With Kronecker Product Approximate Curvature (2020)0.00
- Higher : Improving Instruction Following With Hindsight Generation For Experience Replay (2019)6.34
- Introspective Experience Replay: Look Back When Surprised (2022)0.00