Free Draft-and-Verification: Toward Lossless Parallel Decoding for Diffusion Large Language Models

cs.LG arXiv:2510.00294
View PDF arXiv JSON

Abstract

Diffusion Large Language Models (DLLMs) have emerged as a new paradigm of language modeling beyond autoregressive next-token prediction. Taking advantage of their inherent modeling foundations, DLLMs have the great potential of efficient inference with parallel decoding algorithms, which enable multi-token prediction. However, the high generation quality often requires the number of decoding steps equal to the sequence length, which performs a one-token-per-step decoding, and existing parallel decoding algorithms, which yield suboptimal decoding paths, bring inference speedup at the cost of non-negligible performance degradation. To overcome this challenge, we introduce Free Draft-and-Verification (FreeDave), a novel fast decoding algorithm tailored for DLLMs that achieves lossless parallel decoding without any model modification or extra modules. Specifically, we propose an algorithm of parallel-decoded candidate generation and verification, which is theoretically guaranteed to use the fewest model forward calls to reproduce the same sequence generated by one-token-per-step decoding. By extensive evaluations on math reasoning and code generation benchmarks across different DLLMs, FreeDave is proven to accelerate the inference up to $2.83\times$ without performance degradation.

PDF Viewer