{"ID":2830041,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.11463","arxiv_id":"2512.11463","title":"Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes","abstract":"We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding. Addressing the common challenges of model collapse and training instability in reasoning adaptation, we propose a comprehensive, reproducible training recipe spanning system, data, and algorithmic optimizations. Our approach combines memory-efficient infrastructure for 64K-token contexts using hybrid parallelism and kernel-level optimizations with a two-stage Supervised Fine-Tuning (SFT) curriculum that mitigates distribution mismatch through verified, aligned synthetic data. Furthermore, we detail a robust Reinforcement Learning Fine-Tuning (RLFT) pipeline that stabilizes training via difficulty-aware data filtering and mixed-policy trajectory reuse. Empirical results demonstrate that Motif-2-12.7B-Reasoning achieves performance comparable to models with significantly larger parameter counts across mathematics, coding, and agentic benchmarks, offering the community a competitive open model and a practical blueprint for scaling reasoning capabilities under realistic compute constraints.","short_abstract":"We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding. Addressing the common challenges of model collapse and training instability in reasoning adaptation, we propo...","url_abs":"https://arxiv.org/abs/2512.11463","url_pdf":"https://arxiv.org/pdf/2512.11463v1","authors":"[\"Junghwan Lim\",\"Sungmin Lee\",\"Dongseok Kim\",\"Taehyun Kim\",\"Eunhwan Park\",\"Jeesoo Lee\",\"Jeongdoo Lee\",\"Junhyeok Lee\",\"Wai Ting Cheung\",\"Dahye Choi\",\"Minsu Ha\",\"Jaeheui Her\",\"Jaeyeon Huh\",\"Hanbin Jung\",\"Changjin Kang\",\"Beomgyu Kim\",\"Minjae Kim\",\"Taewhan Kim\",\"Youngrok Kim\",\"Hyukjin Kweon\",\"Haesol Lee\",\"Kungyu Lee\",\"Dongpin Oh\",\"Yeongjae Park\",\"Bokki Ryu\",\"Dongjoo Weon\"]","published":"2025-12-11T00:51:18Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
