{"ID":2891469,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17728","arxiv_id":"2507.17728","title":"Megrez2 Technical Report","abstract":"We present Megrez2, a novel lightweight and high-performance language model architecture optimized for device native deployment. Megrez2 introduces a novel cross-layer expert sharing mechanism, which significantly reduces total parameter count by reusing expert modules across adjacent transformer layers while maintaining most of the model's capacity. It also incorporates pre-gated routing, enabling memory-efficient expert loading and faster inference. As the first instantiation of the Megrez2 architecture, we introduce the Megrez2-Preview model, which is pre-trained on a 5-trillion-token corpus and further enhanced through supervised fine-tuning and reinforcement learning with verifiable rewards. With only 3B activated and 7.5B stored parameters, Megrez2-Preview demonstrates competitive or superior performance compared to larger models on a wide range of tasks, including language understanding, instruction following, mathematical reasoning, and code generation. These results highlight the effectiveness of the Megrez2 architecture to achieve a balance between accuracy, efficiency, and deployability, making it a strong candidate for real-world, resource-constrained applications.","short_abstract":"We present Megrez2, a novel lightweight and high-performance language model architecture optimized for device native deployment. Megrez2 introduces a novel cross-layer expert sharing mechanism, which significantly reduces total parameter count by reusing expert modules across adjacent transformer layers while maintaini...","url_abs":"https://arxiv.org/abs/2507.17728","url_pdf":"https://arxiv.org/pdf/2507.17728v1","authors":"[\"Boxun Li\",\"Yadong Li\",\"Zhiyuan Li\",\"Congyi Liu\",\"Weilin Liu\",\"Guowei Niu\",\"Zheyue Tan\",\"Haiyang Xu\",\"Zhuyu Yao\",\"Tao Yuan\",\"Dong Zhou\",\"Yueqing Zhuang\",\"Bo Zhao\",\"Guohao Dai\",\"Yu Wang\"]","published":"2025-07-23T17:43:07Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Transformer\",\"Language Model\"]","has_code":false}
