{"ID":2860687,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03929","arxiv_id":"2510.03929","title":"Self-Speculative Masked Diffusions","abstract":"We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequence generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.","short_abstract":"We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then...","url_abs":"https://arxiv.org/abs/2510.03929","url_pdf":"https://arxiv.org/pdf/2510.03929v2","authors":"[\"Andrew Campbell\",\"Valentin De Bortoli\",\"Jiaxin Shi\",\"Arnaud Doucet\"]","published":"2025-10-04T20:16:38Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
