{"ID":2832170,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06266","arxiv_id":"2512.06266","title":"Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models","abstract":"We present Nanbeige4-3B, a family of small-scale but high-performing language models. Pretrained on 23T high-quality tokens and finetuned on over 30 million diverse instructions, we extend the boundary of the scaling law for small language models. In pre-training, we design a Fine-Grained Warmup-Stable-Decay (FG-WSD) training scheduler, which progressively refines data mixtures across stages to boost model performance. In post-training, to improve the quality of the SFT data, we design a joint mechanism that integrates deliberative generation refinement and chain-of-thought reconstruction, yielding substantial gains on complex tasks. Following SFT, we employ our flagship reasoning model to distill Nanbeige4-3B through our proposed Dual Preference Distillation (DPD) method, which leads to further performance gains. Finally, a multi-stage reinforcement learning phase was applied, leveraging verifiable rewards and preference modeling to strengthen abilities on both reasoning and human alignment. Extensive evaluations show that Nanbeige4-3B not only significantly outperforms models of comparable parameter scale but also rivals much larger models across a wide range of benchmarks. The model checkpoints are available at https://huggingface.co/Nanbeige.","short_abstract":"We present Nanbeige4-3B, a family of small-scale but high-performing language models. Pretrained on 23T high-quality tokens and finetuned on over 30 million diverse instructions, we extend the boundary of the scaling law for small language models. In pre-training, we design a Fine-Grained Warmup-Stable-Decay (FG-WSD) t...","url_abs":"https://arxiv.org/abs/2512.06266","url_pdf":"https://arxiv.org/pdf/2512.06266v1","authors":"[\"Chen Yang\",\"Guangyue Peng\",\"Jiaying Zhu\",\"Ran Le\",\"Ruixiang Feng\",\"Tao Zhang\",\"Wei Ruan\",\"Xiaoqi Liu\",\"Xiaoxue Cheng\",\"Xiyun Xu\",\"Yang Song\",\"Yanzipeng Gao\",\"Yiming Jia\",\"Yun Xing\",\"Yuntao Wen\",\"Zekai Wang\",\"Zhenwei An\",\"Zhicong Sun\",\"Zongchao Chen\"]","published":"2025-12-06T03:36:27Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false}
