{"ID":2833360,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.05150","arxiv_id":"2512.05150","title":"TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows","abstract":"Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (\u003c 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by $100\\times$ with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.","short_abstract":"Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40...","url_abs":"https://arxiv.org/abs/2512.05150","url_pdf":"https://arxiv.org/pdf/2512.05150v2","authors":"[\"Zhenglin Cheng\",\"Peng Sun\",\"Jianguo Li\",\"Tao Lin\"]","published":"2025-12-03T07:45:46Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Generative Adversarial Network\"]","project_urls":"[\"https://zhenglin-cheng.com/twinflow\"]","has_code":false,"code_links":[{"ID":606315,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2833360,"paper_url":"https://arxiv.org/abs/2512.05150","paper_title":"TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows","repo_url":"https://github.com/inclusionAI/TwinFlow","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0},{"ID":606316,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2833360,"paper_url":"https://arxiv.org/abs/2512.05150","paper_title":"TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows","repo_url":"https://github.com/LINs-lab/RCGM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
