{"ID":2864993,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21835","arxiv_id":"2509.21835","title":"On the $ε$-Free Inference Complexity of Absorbing Discrete Diffusion","abstract":"Absorbing discrete diffusion has emerged as a dominant framework for discrete data generation. However, a significant disparity remains between its empirical success and theoretical understanding: existing analyses fail to demonstrate a complexity advantage over the $\\mathcal{O}(d \\ln(d/ε))$ baseline established for \\emph{uniform} discrete diffusion. We bridge this gap by identifying a critical structural advantage: whereas uniform diffusion redundantly re-denoises valid elements, the absorbing scheme denoises each absorbing state exactly once. Leveraging this insight, we introduce \\emph{Absorbing-Aware Truncated Uniformization} (AATU). We prove that AATU achieves $ε$-TV convergence with $\\mathcal{O}(d \\ln d)$ complexity-\\emph{independent} of the error tolerance $ε$-thereby strictly outperforming existing uniform baselines. Beyond improving convergence rates, our analysis eliminates the restrictive bounded-score assumption commonly required in prior studies of uniformization-based inference. Furthermore, we extend AATU to time-invariant parameterizations, showing that it naturally adopts an imputation-type inference with a uniformly randomized denoising order. When combined with a lazy update strategy, TV convergence requires only $\\mathcal{O}(d)$ discrete score evaluations. These results not only establish a rigorous foundation for absorbing discrete diffusion -- confirming its efficiency in high-accuracy generation -- but also open new avenues for analyzing diffusion-based language models under the masking paradigm.","short_abstract":"Absorbing discrete diffusion has emerged as a dominant framework for discrete data generation. However, a significant disparity remains between its empirical success and theoretical understanding: existing analyses fail to demonstrate a complexity advantage over the $\\mathcal{O}(d \\ln(d/ε))$ baseline established for \\e...","url_abs":"https://arxiv.org/abs/2509.21835","url_pdf":"https://arxiv.org/pdf/2509.21835v2","authors":"[\"Xunpeng Huang\",\"Yingyu Lin\",\"Nishant Jain\",\"Kaibo Wang\",\"Difan Zou\",\"Yian Ma\",\"Tong Zhang\"]","published":"2025-09-26T03:50:17Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}