{"ID":2874213,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.05007","arxiv_id":"2509.05007","title":"Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework","abstract":"Large reasoning models (LRMs) have exhibited strong performance on complex reasoning tasks, with further gains achievable through increased computational budgets at inference. However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby limiting computational efficiency. To overcome this limitation, we propose Sticker-TTS, a novel test-time scaling framework that coordinates three collaborative LRMs to iteratively explore and refine solutions guided by historical attempts. At the core of our framework are distilled key conditions-termed stickers-which drive the extraction, refinement, and reuse of critical information across multiple rounds of reasoning. To further enhance the efficiency and performance of our framework, we introduce a two-stage optimization strategy that combines imitation learning with self-improvement, enabling progressive refinement. Extensive evaluations on three challenging mathematical reasoning benchmarks, including AIME-24, AIME-25, and OlymMATH, demonstrate that Sticker-TTS consistently surpasses strong baselines, including self-consistency and advanced reinforcement learning approaches, under comparable inference budgets. These results highlight the effectiveness of sticker-guided historical experience utilization. Our code and data are available at https://github.com/RUCAIBox/Sticker-TTS.","short_abstract":"Large reasoning models (LRMs) have exhibited strong performance on complex reasoning tasks, with further gains achievable through increased computational budgets at inference. However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby lim...","url_abs":"https://arxiv.org/abs/2509.05007","url_pdf":"https://arxiv.org/pdf/2509.05007v2","authors":"[\"Jie Chen\",\"Jinhao Jiang\",\"Yingqian Min\",\"Zican Dong\",\"Shijie Wang\",\"Wayne Xin Zhao\",\"Ji-Rong Wen\"]","published":"2025-09-05T11:14:11Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":610119,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2874213,"paper_url":"https://arxiv.org/abs/2509.05007","paper_title":"Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework","repo_url":"https://github.com/RUCAIBox/Sticker-TTS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
