{"ID":2840067,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15738","arxiv_id":"2511.15738","title":"Extending Test-Time Scaling: A 3D Perspective with Context, Batch, and Turn","abstract":"Reasoning reinforcement learning (RL) has recently revealed a new scaling effect: test-time scaling. Thinking models such as R1 and o1 improve their reasoning accuracy at test time as the length of the reasoning context increases. However, compared with training-time scaling, test-time scaling is fundamentally limited by the limited context length of base models, which remains orders of magnitude smaller than the amount of tokens consumed during training. We revisit test-time enhancement techniques through the lens of scaling effect and introduce a unified framework of multi-dimensional test-time scaling to extend the capacity of test-time reasoning. Beyond conventional context-length scaling, we consider two additional dimensions: batch scaling, where accuracy improves with parallel sampling, and turn scaling, where iterative self-refinement enhances reasoning quality. Building on this perspective, we propose 3D test-time scaling, which integrates context, batch, and turn scaling. We show that: (1) each dimension demonstrates a test-time scaling effect, but with a bounded capacity; (2) combining all three dimensions substantially improves the reasoning performance of challenging testbeds, including IOI, IMO, and CPHO, and further benefits from human preference feedback; and (3) the human-in-the-loop framework naturally extends to a more open-ended domain, i.e., embodied learning, which enables the design of humanoid control behaviors.","short_abstract":"Reasoning reinforcement learning (RL) has recently revealed a new scaling effect: test-time scaling. Thinking models such as R1 and o1 improve their reasoning accuracy at test time as the length of the reasoning context increases. However, compared with training-time scaling, test-time scaling is fundamentally limited...","url_abs":"https://arxiv.org/abs/2511.15738","url_pdf":"https://arxiv.org/pdf/2511.15738v2","authors":"[\"Chao Yu\",\"Qixin Tan\",\"Jiaxuan Gao\",\"Shi Yu\",\"Hong Lu\",\"Xinting Yang\",\"Zelai Xu\",\"Yu Wang\",\"Yi Wu\",\"Eugene Vinitsky\"]","published":"2025-11-18T14:07:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
