SCRIBE: Structured Mid-level Supervision For Tool-using Language Models
2026 Β· Yuxuan Jiang, Francis Ferraro
Abstract
arXiv:2601.03555v2 Announce Type: replace Abstract: Training reliable tool-augmented agents remains a significant challenge, largely due to the difficulty of credit assignment in multi-step reasoning. While process-level reward models offer a promising direction, existing LLM-based judges often produce noisy and inconsistent signals because they lack fine-grained, task-specific rubrics to distinguish high-level planning from low-level execution. In this work, we introduce SCRIBE (Skill-Conditioned Reward with Intermediate Behavioral Evaluation), a reinforcement learning framework that intervenes at a novel mid-level abstraction. SCRIBE grounds reward modeling in a curated library of skill prototypes, transforming open-ended LLM evaluation into a constrained verification problem. By routing each subgoal to a corresponding prototype, the reward model is equipped with precise, structured rubrics that substantially reduce reward variance. Experimental results show that SCRIBE achieves s
Authors
(none)
Tags
Stats
Related papers
- Aligning Agents Via Planning: A Benchmark For Trajectory-level Reward Modeling (2026)0.00
- Efficiently Aligning Language Models With Online Natural Language Feedback (2026)0.00
- Co-evolution Of Policy And Internal Reward For Language Agents (2026)0.00
- From Laws To Motivation: Guiding Exploration Through Law-based Reasoning And Rewards (2024)0.00
- MICA: Multi-granularity Intertemporal Credit Assignment For Long-horizon Emotional Support Dialogue (2026)0.00
- Self-induced Outcome Potential: Turn-level Credit Assignment For Agents Without Verifiers (2026)0.00
- Reward Hacking Benchmark: Measuring Exploits In LLM Agents With Tool Use (2026)0.00
- MARSHAL: Incentivizing Multi-agent Reasoning Via Self-play With Strategic Llms (2025)0.00