{"ID":2866306,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19681","arxiv_id":"2509.19681","title":"Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving","abstract":"Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language reasoning for generated solutions. Our verifier improves the accuracy and efficiency of test-time strategies like best-of-n and self-reflection. Crucially, it excels at identifying challenging failure modes, such as when both candidate solutions are identically incorrect, succeeding where standard methods like majority voting fail.","short_abstract":"Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language rea...","url_abs":"https://arxiv.org/abs/2509.19681","url_pdf":"https://arxiv.org/pdf/2509.19681v1","authors":"[\"Anisha Garg\",\"Engin Tekin\",\"Yash More\",\"David Bick\",\"Nishit Neema\",\"Ganesh Venkatesh\"]","published":"2025-09-24T01:36:00Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
