{"ID":2840099,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14584","arxiv_id":"2511.14584","title":"ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing","abstract":"We present ReflexGrad, a dual-process architecture for within-episode failure recovery in LLM agents without demonstrations. When agents commit to a wrong approach early and exhaust the step budget, the post-failure trajectory contains the information to escape -- but no published architecture acts on it within a single episode. ReflexGrad routes between a fast process (TextGrad-style continuous refinement every $k{=}3$ steps) and a slow process (Reflexion-style causal diagnosis when $m{=}5$ consecutive low-progress scores fire a routing gate). A deterministic priority merge keeps the natural-language policy coherent, and each slow activation emits three observable artifacts: a reproducible trigger, a causal diagnostic, and a verified fix. On ALFWorld 134 tasks, $n{=}10$ seeds, no demonstrations, ReflexGrad lifts Qwen-3-8B from $35.1\\%$ to $75.4\\%$ ($+40.3$pp), beating compute-matched 1-shot LATS by $+2.7$pp ($p{\\approx}0.01$), ToT by $+5.7$pp ($p{\u003c}10^{-4}$), and Self-Refine by $+6.7$pp ($p{\u003c}10^{-5}$); on GPT-5 the lift is $46.3{\\to}88.1\\%$ ($+41.8$pp). The $1.5$pp cross-model difference is within seed noise ($p{\\approx}0.13$), suggesting that the routing mechanism, rather than model scale, is the primary source of the gain. Code, prompts, per-seed logs, and sensitivity sweeps are released.","short_abstract":"We present ReflexGrad, a dual-process architecture for within-episode failure recovery in LLM agents without demonstrations. When agents commit to a wrong approach early and exhaust the step budget, the post-failure trajectory contains the information to escape -- but no published architecture acts on it within a singl...","url_abs":"https://arxiv.org/abs/2511.14584","url_pdf":"https://arxiv.org/pdf/2511.14584v4","authors":"[\"Ankush Kadu\",\"Aswanth Krishnan\"]","published":"2025-11-18T15:25:05Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
