{"ID":2844120,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07372","arxiv_id":"2511.07372","title":"Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training","abstract":"Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and limitations remains incomplete. To bridge this gap, we develop an abstract theoretical framework and identify sufficient conditions under which curriculum post-training yields exponential improvements in sample complexity. To substantiate this framework, we model the base model's Chain-of-Thought generation as a state-conditioned autoregressive reasoning tree, and formalize curriculum subtasks as either depth-increasing curricula that progressively extend reasoning horizons or hint-decreasing curricula that gradually remove partial hints. Our analysis shows that reinforcement learning finetuning with both curriculum strategies achieves high accuracy with polynomial sample complexity, whereas non-curriculum counterpart encounters an exponential complexity bottleneck. We further establish analogous guarantees for test-time scaling. Empirical simulations support our theoretical findings. Code is available at https://github.com/DakeBU/Curriculum-Post-training.","short_abstract":"Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and limitations remains incomplete. To bridge this gap, we develop an abstract theoretical fram...","url_abs":"https://arxiv.org/abs/2511.07372","url_pdf":"https://arxiv.org/pdf/2511.07372v3","authors":"[\"Dake Bu\",\"Wei Huang\",\"Andi Han\",\"Atsushi Nitanda\",\"Hau-San Wong\",\"Qingfu Zhang\",\"Taiji Suzuki\"]","published":"2025-11-10T18:29:54Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Transformer\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":607270,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2844120,"paper_url":"https://arxiv.org/abs/2511.07372","paper_title":"Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training","repo_url":"https://github.com/DakeBU/Curriculum-Post-training","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
