{"ID":3052304,"CreatedAt":"2026-06-04T04:41:36.695875263Z","UpdatedAt":"2026-06-06T04:39:12.706778348Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04446","arxiv_id":"2606.04446","title":"D^2SD: Accelerating Speculative Decoding with Dual Diffusion Draft Models","abstract":"Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the first mismatch occurs, all subsequent draft tokens are discarded, resulting in a limited acceptance rate. Naively batching more draft candidate sequences only introduces a marginal improvement, as redundant or poorly placed branches increase the cost of drafting and verification without proportionally increasing the number of accepted tokens. We propose D^2SD, a dual diffusion draft speculative decoding framework that organizes candidates into a confidence-guided prefix tree, where the first diffusion drafter generates a block along with per-position confidence scores that are used to identify the most likely rejection boundary and select the top-K prefix ranges for recovery; the second variable-prefix diffusion drafter re-anchors at each selected prefix and proposes alternative continuations in one batched pass; the resulting shared-prefix candidates are jointly verified via cascade attention. Empirically, D^2SD shows clear improvements over both the underlying diffusion approach and strong autoregressive speculative decoding baselines.","short_abstract":"Speculative decoding accelerates autoregressive large language model inference by drafting multiple tokens and verifying them in a single target-model forward pass. Recent diffusion-based drafters generate an entire block of tokens in parallel but usually commit to a single draft sequence per verification: once the fir...","url_abs":"https://arxiv.org/abs/2606.04446","url_pdf":"https://arxiv.org/pdf/2606.04446v1","authors":"[\"Liyuan Zhang\",\"Jiarui Zhang\",\"Jinwei Yao\",\"Ran Yan\",\"Yuchen Yang\",\"Jiahao Zhang\",\"Tongkai Yang\",\"Yi Wu\",\"Binhang Yuan\"]","published":"2026-06-03T04:48:00Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}
