{"ID":2863115,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00294","arxiv_id":"2510.00294","title":"Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models","abstract":"Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Taking advantage of their inherent modeling foundations, DLLMs have the great potential of efficient inference with parallel decoding algorithms, which enable multi-token prediction. However, the high generation quality often requires the number of decoding steps equal to the sequence length, which performs a one-token-per-step decoding, and existing parallel decoding algorithms, which yield suboptimal decoding paths, bring inference speedup at the cost of non-negligible performance degradation. To overcome this challenge, we introduce Free Draft-and-Verification (FreeDave), a novel fast decoding algorithm tailored for DLLMs that achieves lossless parallel decoding without any model modification or extra modules. Specifically, we propose an algorithm of parallel-decoded candidate generation and verification, which is theoretically guaranteed to use the fewest model forward calls to reproduce the same sequence generated by one-token-per-step decoding. By extensive evaluations on math reasoning and code generation benchmarks across different DLLMs, FreeDave is proven to accelerate the inference up to $2.83\\times$ without performance degradation.","short_abstract":"Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Taking advantage of their inherent modeling foundations, DLLMs have the great potential of efficient inference with parallel decoding algorithms, which enable multi-token prediction....","url_abs":"https://arxiv.org/abs/2510.00294","url_pdf":"https://arxiv.org/pdf/2510.00294v3","authors":"[\"Shutong Wu\",\"Jiawei Zhang\"]","published":"2025-09-30T21:28:04Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false}
