Abstract

arXiv:2510.26020v2 Announce Type: replace-cross Abstract: Multi-tool-integrated reasoning enables LLM-empowered tool-use agents to solve complex tasks by interleaving natural-language reasoning with calls to external tools. However, training such agents from outcome-only rewards suffers from credit-assignment ambiguity, obscuring which intermediate tool-use decisions drive success or failure. In this paper, we propose PORTool, an importance-aware policy-optimization algorithm that reinforces agents' tool-use competence from outcome-level supervision while assigning reward at the step level. Specifically, PORTool generates a rewarded rollout tree in which trajectories share prefixes before branching, enabling direct comparisons among alternative tool-use decisions within the same context. It then estimates each step's importance by a correctness-dominant signal, i.e., whether descendants of that step can ultimately produce a correct final answer, plus an auxiliary term indicating wheth

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keywu2026portool

Related papers