{"ID":2864383,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03280","arxiv_id":"2510.03280","title":"Training Optimal Large Diffusion Language Models","abstract":"We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.","short_abstract":"We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-t...","url_abs":"https://arxiv.org/abs/2510.03280","url_pdf":"https://arxiv.org/pdf/2510.03280v2","authors":"[\"Jinjie Ni\",\"Qian Liu\",\"Chao Du\",\"Longxu Dou\",\"Hang Yan\",\"Zili Wang\",\"Tianyu Pang\",\"Michael Qizhe Shieh\"]","published":"2025-09-28T16:20:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}