{"ID":2878226,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.19229","arxiv_id":"2508.19229","title":"StepWiser: Stepwise Generative Judges for Wiser Reasoning","abstract":"As models increasingly leverage multi-step reasoning strategies to solve complex problems, supervising the logical validity of these intermediate steps has become a critical research challenge. Process reward models address this by providing step-by-step feedback, but current approaches have two major drawbacks: they typically function as classifiers without providing explanations, and their reliance on supervised fine-tuning with static datasets limits generalization. Inspired by recent advances, we reframe stepwise reward modeling from a classification task to a reasoning task itself. We thus propose a generative judge that reasons about the policy model's reasoning steps (i.e., meta-reasons), outputting thinking tokens before delivering a final verdict. Our model, StepWiser, is trained by reinforcement learning using relative outcomes of rollouts. We show it provides (i) better judgment accuracy on intermediate steps than existing methods; (ii) can be used to improve the policy model at training time; and (iii) improves inference-time search.","short_abstract":"As models increasingly leverage multi-step reasoning strategies to solve complex problems, supervising the logical validity of these intermediate steps has become a critical research challenge. Process reward models address this by providing step-by-step feedback, but current approaches have two major drawbacks: they t...","url_abs":"https://arxiv.org/abs/2508.19229","url_pdf":"https://arxiv.org/pdf/2508.19229v2","authors":"[\"Wei Xiong\",\"Wenting Zhao\",\"Weizhe Yuan\",\"Olga Golovneva\",\"Tong Zhang\",\"Jason Weston\",\"Sainbayar Sukhbaatar\"]","published":"2025-08-26T17:45:05Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
