{"ID":2852300,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18692","arxiv_id":"2510.18692","title":"MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation","abstract":"Long video generation with Diffusion Transformers (DiTs) is bottlenecked by the quadratic scaling of full attention with sequence length. Since attention is highly redundant, outputs are dominated by a small subset of query-key pairs. Existing sparse methods rely on blockwise coarse estimation, whose accuracy-efficiency trade-offs are constrained by block size. This paper introduces Mixture-of-Groups Attention (MoGA), an efficient sparse attention that uses a lightweight, learnable token router to precisely match tokens without blockwise estimation. Through semantic-aware routing, MoGA enables effective long-range interactions. As a kernel-free method, MoGA integrates seamlessly with modern attention stacks, including FlashAttention and sequence parallelism. Building on MoGA, we develop an efficient long video generation model that end-to-end produces minute-level, multi-shot, 480p videos at 24 fps, with a context length of approximately 580k. Comprehensive experiments on various video generation tasks validate the effectiveness of our approach.","short_abstract":"Long video generation with Diffusion Transformers (DiTs) is bottlenecked by the quadratic scaling of full attention with sequence length. Since attention is highly redundant, outputs are dominated by a small subset of query-key pairs. Existing sparse methods rely on blockwise coarse estimation, whose accuracy-efficienc...","url_abs":"https://arxiv.org/abs/2510.18692","url_pdf":"https://arxiv.org/pdf/2510.18692v1","authors":"[\"Weinan Jia\",\"Yuning Lu\",\"Mengqi Huang\",\"Hualiang Wang\",\"Binyuan Huang\",\"Nan Chen\",\"Mu Liu\",\"Jidong Jiang\",\"Zhendong Mao\"]","published":"2025-10-21T14:50:42Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
