{"ID":2854777,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14913","arxiv_id":"2510.14913","title":"Budget-aware Test-time Scaling via Discriminative Verification","abstract":"Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practicality. In this work, we shift the focus to a more budget-aware paradigm: discriminative verification. We conduct a thorough empirical analysis and demonstrate that while discriminative verifiers may underperform in isolation, combining them with self-consistency in a hybrid approach creates a powerful and efficient test-time scaling mechanism. Notably, under a fixed compute budget, this hybrid approach surpasses state-of-the-art generative verification by a significant margin: achieving up to 15.3\\% higher accuracy on AIME2025. Our findings establish that for practical, real-world applications, budget-aware scaling with discriminative verifiers is not only a \"free\" upgrade over self-consistency, but also a more effective and efficient alternative to costly generative techniques. Code is available at https://github.com/wang-research-lab/verification.","short_abstract":"Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practical...","url_abs":"https://arxiv.org/abs/2510.14913","url_pdf":"https://arxiv.org/pdf/2510.14913v1","authors":"[\"Kyle Montgomery\",\"Sijun Tan\",\"Yuqi Chen\",\"Siyuan Zhuang\",\"Tianjun Zhang\",\"Raluca Ada Popa\",\"Chenguang Wang\"]","published":"2025-10-16T17:30:02Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":608196,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854777,"paper_url":"https://arxiv.org/abs/2510.14913","paper_title":"Budget-aware Test-time Scaling via Discriminative Verification","repo_url":"https://github.com/wang-research-lab/verification","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
