{"ID":2881186,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13021","arxiv_id":"2508.13021","title":"Empirical Analysis of Decoding Biases in Masked Diffusion Models","abstract":"Masked diffusion models (MDMs), which leverage bidirectional attention and a denoising process, are narrowing the performance gap with autoregressive models (ARMs). However, their internal attention mechanisms remain under-explored. This paper investigates the attention behaviors in MDMs, revealing the phenomenon of Attention Floating. Unlike ARMs, where attention converges to a fixed sink, MDMs exhibit dynamic, dispersed attention anchors that shift across denoising steps and layers. Further analysis reveals its Shallow Structure-Aware, Deep Content-Focused attention mechanism: shallow layers utilize floating tokens to build a global structural framework, while deeper layers allocate more capability toward capturing semantic content. Empirically, this distinctive attention pattern provides a mechanistic explanation for the strong in-context learning capabilities of MDMs, allowing them to double the performance compared to ARMs in knowledge-intensive tasks. All codes are available at https://github.com/NEUIR/Uncode.","short_abstract":"Masked diffusion models (MDMs), which leverage bidirectional attention and a denoising process, are narrowing the performance gap with autoregressive models (ARMs). However, their internal attention mechanisms remain under-explored. This paper investigates the attention behaviors in MDMs, revealing the phenomenon of At...","url_abs":"https://arxiv.org/abs/2508.13021","url_pdf":"https://arxiv.org/pdf/2508.13021v3","authors":"[\"Pengcheng Huang\",\"Tianming Liu\",\"Zhenghao Liu\",\"Yukun Yan\",\"Shuo Wang\",\"Tong Xiao\",\"Zulong Chen\",\"Maosong Sun\"]","published":"2025-08-18T15:38:37Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":610782,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2881186,"paper_url":"https://arxiv.org/abs/2508.13021","paper_title":"Empirical Analysis of Decoding Biases in Masked Diffusion Models","repo_url":"https://github.com/NEUIR/Uncode","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
