{"ID":2844887,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07464","arxiv_id":"2511.07464","title":"Motif 2 12.7B technical report","abstract":"We introduce Motif-2-12.7B, a new open-weight foundation model that pushes the efficiency frontier of large language models by combining architectural innovation with system-level optimization. Designed for scalable language understanding and robust instruction generalization under constrained compute budgets, Motif-2-12.7B builds upon Motif-2.6B with the integration of Grouped Differential Attention (GDA), which improves representational efficiency by disentangling signal and noise-control attention pathways. The model is pre-trained on 5.5 trillion tokens spanning diverse linguistic, mathematical, scientific, and programming domains using a curriculum-driven data scheduler that gradually changes the data composition ratio. The training system leverages the MuonClip optimizer alongside custom high-performance kernels, including fused PolyNorm activations and the Parallel Muon algorithm, yielding significant throughput and memory efficiency gains in large-scale distributed environments. Post-training employs a three-stage supervised fine-tuning pipeline that successively enhances general instruction adherence, compositional understanding, and linguistic precision. Motif-2-12.7B demonstrates competitive performance across diverse benchmarks, showing that thoughtful architectural scaling and optimized training design can rival the capabilities of much larger models.","short_abstract":"We introduce Motif-2-12.7B, a new open-weight foundation model that pushes the efficiency frontier of large language models by combining architectural innovation with system-level optimization. Designed for scalable language understanding and robust instruction generalization under constrained compute budgets, Motif-2-...","url_abs":"https://arxiv.org/abs/2511.07464","url_pdf":"https://arxiv.org/pdf/2511.07464v1","authors":"[\"Junghwan Lim\",\"Sungmin Lee\",\"Dongseok Kim\",\"Taehyun Kim\",\"Eunhwan Park\",\"Jeesoo Lee\",\"Jeongdoo Lee\",\"Junhyeok Lee\",\"Wai Ting Cheung\",\"Dahye Choi\",\"Jaeheui Her\",\"Jaeyeon Huh\",\"Hanbin Jung\",\"Changjin Kang\",\"Beomgyu Kim\",\"Minjae Kim\",\"Taewhan Kim\",\"Youngrok Kim\",\"Hyukjin Kweon\",\"Haesol Lee\",\"Kungyu Lee\",\"Dongpin Oh\",\"Yeongjae Park\",\"Bokki Ryu\",\"Dongjoo Weon\"]","published":"2025-11-07T10:32:16Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}