{"ID":2825925,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20573","arxiv_id":"2512.20573","title":"Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs","abstract":"Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressive (AR) verifiers. Our core insight is that dLLM's speed from parallel decoding drastically lowers the risk of costly rejections, providing a practical mechanism to effectively realize the (elusive) lengthy drafts that lead to large speedups with speculative decoding. We present FailFast, a dLLM-based speculative decoding framework that realizes this approach by dynamically adapting its speculation length. It \"fails fast\" by spending minimal compute in hard-to-speculate regions to shrink speculation latency and \"wins big\" by aggressively extending draft lengths in easier regions to reduce verification latency (in many cases, speculating and accepting 70 tokens at a time!). Without any fine-tuning, FailFast delivers lossless acceleration of AR LLMs and achieves up to 4.9$\\times$ speedup over vanilla decoding, 1.7$\\times$ over the best naive dLLM drafter, and 1.7$\\times$ over EAGLE-3 across diverse models and workloads. We open-source FailFast at https://github.com/ruipeterpan/failfast.","short_abstract":"Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of dLLMs can actually be a strength for drafters in speculative decoding with autoregressive (AR) verifiers. O...","url_abs":"https://arxiv.org/abs/2512.20573","url_pdf":"https://arxiv.org/pdf/2512.20573v3","authors":"[\"Rui Pan\",\"Zhuofu Chen\",\"Hongyi Liu\",\"Arvind Krishnamurthy\",\"Ravi Netravali\"]","published":"2025-12-23T18:16:58Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.DC\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":605709,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825925,"paper_url":"https://arxiv.org/abs/2512.20573","paper_title":"Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs","repo_url":"https://github.com/ruipeterpan/failfast","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
