{"ID":2863917,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25401","arxiv_id":"2509.25401","title":"FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers","abstract":"Multi-Modal Diffusion Transformers (DiTs) demonstrate exceptional capabilities in visual synthesis, yet their deployment remains constrained by substantial computational demands. To alleviate this bottleneck, many sparsity-based acceleration methods have been proposed. However, their diverse sparsity patterns often require customized kernels for high-performance inference, limiting universality. We propose FlashOmni, a unified sparse attention engine compatible with arbitrary DiT architectures. FlashOmni introduces flexible sparse symbols to standardize the representation of a wide range of sparsity strategies, such as feature caching and block-sparse skipping. This unified abstraction enables the execution of diverse sparse computations within a single attention kernel. In addition, FlashOmni designs optimized sparse GEMMs for attention blocks, leveraging sparse symbols to eliminate redundant computations and further improve efficiency. Experiments demonstrate that FlashOmni delivers near-linear, closely matching the sparsity ratio speedup (1:1) in attention and GEMM-$Q$, and achieves 2.5$\\times$-3.8$\\times$ acceleration in GEMM-$O$ (max peaking at about 87.5% of the theoretical limit). Applied with a multi-granularity sparsity strategy, it enables the Hunyuan model (33K) to achieve about 1.5$\\times$ end-to-end acceleration without degrading visual quality.","short_abstract":"Multi-Modal Diffusion Transformers (DiTs) demonstrate exceptional capabilities in visual synthesis, yet their deployment remains constrained by substantial computational demands. To alleviate this bottleneck, many sparsity-based acceleration methods have been proposed. However, their diverse sparsity patterns often req...","url_abs":"https://arxiv.org/abs/2509.25401","url_pdf":"https://arxiv.org/pdf/2509.25401v1","authors":"[\"Liang Qiao\",\"Yue Dai\",\"Yeqi Huang\",\"Hongyu Kan\",\"Jun Shi\",\"Hong An\"]","published":"2025-09-29T18:57:14Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.PF\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
