{"ID":2863774,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25035","arxiv_id":"2509.25035","title":"Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct","abstract":"Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained diffusion large language model (dLLM) and distills a few-step student for fast generation. The model distilled with DiDi-Instruct matches or surpasses its dLLM teacher and the GPT-2 baseline while providing up to 64$\\times$ acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which leads to a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler to improve training stability, model coverage, and inference quality. On the OpenWebText benchmark, DiDi-Instruct achieves perplexity ranging from 62.2 (8 NFEs) to 18.4 (128 NFEs), outperforming prior accelerated dLLMs and the GPT-2 baseline. These gains incur a negligible entropy loss (around $1$%) and reduce additional training wall-clock time by more than $20\\times$ compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, downstream task evaluations, and unconditional protein sequence generation. In conclusion, DiDi-Instruct enables efficient and effective distillation for language generation in the blink of an eye.","short_abstract":"Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained diffusion large language model (dLLM) and distills a few-step student for fast g...","url_abs":"https://arxiv.org/abs/2509.25035","url_pdf":"https://arxiv.org/pdf/2509.25035v3","authors":"[\"Haoyang Zheng\",\"Xinyang Liu\",\"Cindy Xiangrui Kong\",\"Nan Jiang\",\"Zheyuan Hu\",\"Weijian Luo\",\"Wei Deng\",\"Guang Lin\"]","published":"2025-09-29T16:55:44Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false}
