← all papers · overview

ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing

Abstract

We present ReflexGrad, a dual-process architecture for within-episode failure recovery in LLM agents without demonstrations. When agents commit to a wrong approach early and exhaust the step budget, the post-failure trajectory contains the information to escape -- but no published architecture acts on it within a single episode. ReflexGrad routes between a fast process (TextGrad-style continuous refinement every k=3k{=}3 steps) and a slow process (Reflexion-style causal diagnosis when m=5m{=}5 consecutive low-progress scores fire a routing gate). A deterministic priority merge keeps the natural-language policy coherent, and each slow activation emits three observable artifacts: a reproducible trigger, a causal diagnostic, and a verified fix. On ALFWorld 134 tasks, n=10n{=}10 seeds, no demonstrations, ReflexGrad lifts Qwen-3-8B from 35.1%35.1\% to 75.4%75.4\% (+40.3+40.3pp), beating compute-matched 1-shot LATS by +2.7+2.7pp (p0.01p{\approx}0.01), ToT by +5.7+5.7pp (p<104p{<}10^{-4}), and Self-Refine by +6.7+6.7pp (p<105p{<}10^{-5}); on GPT-5 the lift is 46.388.1%46.3{\to}88.1\% (+41.8+41.8pp). The 1.51.5pp cross-model difference is within seed noise (p0.13p{\approx}0.13), suggesting that the routing mechanism, rather than model scale, is the primary source of the gain. Code, prompts, per-seed logs, and sensitivity sweeps are released.

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).