{"ID":2856970,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09988","arxiv_id":"2510.09988","title":"Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey","abstract":"Deliberative tree search is a cornerstone of modern Large Language Model (LLM) research, driving the pivot from brute-force scaling toward algorithmic efficiency. This single paradigm unifies two critical frontiers: \\textbf{Test-Time Scaling (TTS)}, which deploys on-demand computation to solve hard problems, and \\textbf{Self-Improvement}, which uses search-generated data to durably enhance model parameters. However, this burgeoning field is fragmented and lacks a common formalism, particularly concerning the ambiguous role of the reward signal -- is it a transient heuristic or a durable learning target? This paper resolves this ambiguity by introducing a unified framework that deconstructs search algorithms into three core components: the \\emph{Search Mechanism}, \\emph{Reward Formulation}, and \\emph{Transition Function}. We establish a formal distinction between transient \\textbf{Search Guidance} for TTS and durable \\textbf{Parametric Reward Modeling} for Self-Improvement. Building on this formalism, we introduce a component-centric taxonomy, synthesize the state-of-the-art, and chart a research roadmap toward more systematic progress in creating autonomous, self-improving agents.","short_abstract":"Deliberative tree search is a cornerstone of modern Large Language Model (LLM) research, driving the pivot from brute-force scaling toward algorithmic efficiency. This single paradigm unifies two critical frontiers: \\textbf{Test-Time Scaling (TTS)}, which deploys on-demand computation to solve hard problems, and \\textb...","url_abs":"https://arxiv.org/abs/2510.09988","url_pdf":"https://arxiv.org/pdf/2510.09988v1","authors":"[\"Jiaqi Wei\",\"Xiang Zhang\",\"Yuejin Yang\",\"Wenxuan Huang\",\"Juntai Cao\",\"Sheng Xu\",\"Xiang Zhuang\",\"Zhangyang Gao\",\"Muhammad Abdul-Mageed\",\"Laks V. S. Lakshmanan\",\"Chenyu You\",\"Wanli Ouyang\",\"Siqi Sun\"]","published":"2025-10-11T03:29:18Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
