{"ID":2923170,"CreatedAt":"2026-06-02T03:17:13.356150003Z","UpdatedAt":"2026-06-04T07:41:34.29888543Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02113","arxiv_id":"2606.02113","title":"A Primer in Post-Training Reasoning Data: What We Know About How It Works","abstract":"Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recipes, reward-model studies, benchmarks, and frontier system reports. This paper is the first primer to synthesize over 150 key public studies and system reports on post-training reasoning data. We organize the field around four questions: what data objects exist, what makes them useful, how they are constructed, and how they scale. Together, this organization provides an attribution framework for future reasoning-data releases and post-training recipes.","short_abstract":"Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recip...","url_abs":"https://arxiv.org/abs/2606.02113","url_pdf":"https://arxiv.org/pdf/2606.02113v1","authors":"[\"Yaoming Li\",\"Guangxiang Zhao\",\"Qilong Shi\",\"Lin Sun\",\"Xiangzheng Zhang\",\"Tong Yang\"]","published":"2026-06-01T11:45:50Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
