Portool: Importance-aware Policy Optimization With Rewarded Tree For Multi-tool-integrated Reasoning
2026 Β· Feijie Wu, Weiwu Zhu, Yuxiang Zhang, et al.
Abstract
arXiv:2510.26020v2 Announce Type: replace-cross Abstract: Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents' tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step's importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final answer, plus an auxiliary term indicating wheth
Authors
(none)
Tags
Stats
Related papers
- Adapt To Thrive! Adaptive Power-mean Policy Optimization For Improved LLM Reasoning (2026)0.00
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- Phgpo: Pheromone-guided Policy Optimization For Long-horizon Tool Planning (2026)0.00
- ANO: A Principled Approach To Robust Policy Optimization (2026)0.00
- Recode: Reinforcing Code Generation With Reasoning-process Rewards (2026)0.00
- Toward Negotiable Reinforcement Learning: Shifting Priorities In Pareto Optimal Sequential Decision-making (2017)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Turn-ppo: Turn-level Advantage Estimation With PPO For Improved Multi-turn RL In Agentic Llms (2025)0.00