{"ID":2830635,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.15745","arxiv_id":"2512.15745","title":"LLaDA2.0: Scaling Up Diffusion Language Models to 100B","abstract":"This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge inheritance, progressive adaption and efficiency-aware design principle, and seamless converts a pre-trained AR model into dLLM with a novel 3-phase block-level WSD based training scheme: progressive increasing block-size in block diffusion (warm-up), large-scale full-sequence diffusion (stable) and reverting back to compact-size block diffusion (decay). Along with post-training alignment with SFT and DPO, we obtain LLaDA2.0-mini (16B) and LLaDA2.0-flash (100B), two instruction-tuned Mixture-of-Experts (MoE) variants optimized for practical deployment. By preserving the advantages of parallel decoding, these models deliver superior performance and efficiency at the frontier scale. Both models were open-sourced.","short_abstract":"This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment. Instead of costly training from scratch, LLaDA2.0 upholds knowledge...","url_abs":"https://arxiv.org/abs/2512.15745","url_pdf":"https://arxiv.org/pdf/2512.15745v2","authors":"[\"Tiwei Bie\",\"Maosong Cao\",\"Kun Chen\",\"Lun Du\",\"Mingliang Gong\",\"Zhuochen Gong\",\"Yanmei Gu\",\"Jiaqi Hu\",\"Zenan Huang\",\"Zhenzhong Lan\",\"Chengxi Li\",\"Chongxuan Li\",\"Jianguo Li\",\"Zehuan Li\",\"Huabin Liu\",\"Lin Liu\",\"Guoshan Lu\",\"Xiaocheng Lu\",\"Yuxin Ma\",\"Jianfeng Tan\",\"Lanning Wei\",\"Ji-Rong Wen\",\"Yipeng Xing\",\"Xiaolu Zhang\",\"Junbo Zhao\",\"Da Zheng\",\"Jun Zhou\",\"Junlin Zhou\",\"Zhanchao Zhou\",\"Liwang Zhu\",\"Yihong Zhuang\"]","published":"2025-12-10T09:26:18Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false}
