Abstract
We present ReflexGrad, a dual-process architecture for within-episode failure recovery in LLM agents without demonstrations. When agents commit to a wrong approach early and exhaust the step budget, the post-failure trajectory contains the information to escape -- but no published architecture acts on it within a single episode. ReflexGrad routes between a fast process (TextGrad-style continuous refinement every steps) and a slow process (Reflexion-style causal diagnosis when consecutive low-progress scores fire a routing gate). A deterministic priority merge keeps the natural-language policy coherent, and each slow activation emits three observable artifacts: a reproducible trigger, a causal diagnostic, and a verified fix. On ALFWorld 134 tasks, seeds, no demonstrations, ReflexGrad lifts Qwen-3-8B from to (pp), beating compute-matched 1-shot LATS by pp (), ToT by pp (), and Self-Refine by pp (); on GPT-5 the lift is (pp). The pp cross-model difference is within seed noise (), suggesting that the routing mechanism, rather than model scale, is the primary source of the gain. Code, prompts, per-seed logs, and sensitivity sweeps are released.