Spark: Strategic Policy-aware Exploration Via Dynamic Branching For Long-horizon Agentic Learning
2026 Β· Jinyang Wu, Shuo Yang, Changpeng Yang, et al.
Abstract
Reinforcement learning has empowered large language models to act as intelligent agents, yet training them for long-horizon tasks remains challenging due to the scarcity of high-quality trajectories, especially under limited resources. Existing methods typically scale up rollout sizes and indiscriminately allocate computational resources among intermediate steps. Such attempts inherently waste substantial computation budget on trivial steps while failing to guarantee sample quality. To address this, we propose \textbf\{Spark\} (\textbf\{S\}trategic \textbf\{P\}olicy-\textbf\{A\}ware explo\textbf\{R\}ation via \textbf\{K\}ey-state dynamic branching), a novel framework that selectively branches at critical decision states for resource-efficient exploration. Our key insight is to activate adaptive branching exploration at critical decision points to probe promising trajectories, thereby achieving precise resource allocation that prioritizes sampling quality over blind coverage. This desig
Authors
(none)
Tags
Stats
Related papers
- Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- Llm-explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven By Large Language Models (2025)0.00
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00
- Learning Adaptive Exploration Strategies In Dynamic Environments Through Informed Policy Regularization (2020)0.00
- Spacer: Self-play Anchoring With Centralized Reference Models (2025)0.00
- Policy Augmentation: An Exploration Strategy For Faster Convergence Of Deep Reinforcement Learning Algorithms (2021)2.26
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00