{"ID":2892207,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.15512","arxiv_id":"2507.15512","title":"Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models","abstract":"Test-Time Scaling (TTS) is a promising approach to progressively elicit the model's intelligence during inference. Recently, training-based TTS methods, such as continued reinforcement learning (RL), have further surged in popularity, while training-free TTS methods are gradually fading from prominence. However, the additional computation overhead of training amplifies the burden on test-time scaling. In this paper, we focus on training-free TTS methods for reasoning. We first design Conditional Step-level Self-refinement, a fine-grained sequential scaling method guided by process verification. On top of its effectiveness, we further combine it with other classical parallel scaling methods at the step level, to introduce a novel inference paradigm called Hybrid Test-Time Scaling. Extensive experiments on five instruction-tuned LLMs across different scales (3B-14B) and families demonstrate that hybrid strategy incorporating various training-free TTS methods at a fine granularity has considerable potential for expanding the reasoning performance boundaries of LLMs.","short_abstract":"Test-Time Scaling (TTS) is a promising approach to progressively elicit the model's intelligence during inference. Recently, training-based TTS methods, such as continued reinforcement learning (RL), have further surged in popularity, while training-free TTS methods are gradually fading from prominence. However, the ad...","url_abs":"https://arxiv.org/abs/2507.15512","url_pdf":"https://arxiv.org/pdf/2507.15512v3","authors":"[\"Kaiyan Chang\",\"Yonghao Shi\",\"Chenglong Wang\",\"Hang Zhou\",\"Chi Hu\",\"Xiaoqian Liu\",\"Yingfeng Luo\",\"Yuan Ge\",\"Tong Xiao\",\"Jingbo Zhu\"]","published":"2025-07-21T11:28:09Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
