{"ID":2880792,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14148","arxiv_id":"2508.14148","title":"DPad: Efficient Diffusion Language Models with Suffix Dropout","abstract":"Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method that restricts attention to a small set of nearby suffix tokens, preserving fidelity while eliminating redundancy. DPad integrates two strategies: (i) a sliding window, which maintains a fixed-length suffix window, and (ii) distance-decay dropout, which deterministically removes distant suffix tokens before attention computation. This simple design is compatible with existing optimizations such as prefix caching and can be implemented with only a few lines of code. Comprehensive evaluations across multiple benchmarks on LLaDA-1.5 and Dream models demonstrate that DPad delivers up to $\\mathbf{61.4\\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference. Our code is available at https://github.com/Crys-Chen/DPad.","short_abstract":"Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose Diffusion Scratchpad (DPad), a training-free method...","url_abs":"https://arxiv.org/abs/2508.14148","url_pdf":"https://arxiv.org/pdf/2508.14148v2","authors":"[\"Xinhua Chen\",\"Sitao Huang\",\"Cong Guo\",\"Chiyue Wei\",\"Yintao He\",\"Jianyi Zhang\",\"Hai \\\"Helen\\\" Li\",\"Yiran Chen\"]","published":"2025-08-19T16:56:51Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610726,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880792,"paper_url":"https://arxiv.org/abs/2508.14148","paper_title":"DPad: Efficient Diffusion Language Models with Suffix Dropout","repo_url":"https://github.com/Crys-Chen/DPad","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
