{"ID":2858283,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.08431","arxiv_id":"2510.08431","title":"Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency","abstract":"Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks remains unclear due to infrastructure challenges in Jacobian-vector product (JVP) computation and the limitations of evaluation benchmarks like FID. This work represents the first effort to scale up continuous-time consistency to general application-level image and video diffusion models, and to make JVP-based distillation effective at large scale. We first develop a parallelism-compatible FlashAttention-2 JVP kernel, enabling sCM training on models with over 10 billion parameters and high-dimensional video tasks. Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the \"mode-covering\" nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. This integration complements sCM with the \"mode-seeking\" reverse divergence, effectively improving visual quality while maintaining high generation diversity. Validated on large-scale models (Cosmos-Predict2, Wan2.1) up to 14B parameters and 5-second videos, rCM generally matches the state-of-the-art distillation method DMD2 on quality metrics while mitigating mode collapse and offering notable advantages in diversity, all without GAN tuning or extensive hyperparameter searches. The distilled models generate high-fidelity samples in only $1\\sim4$ steps, accelerating diffusion sampling by $15\\times\\sim50\\times$. These results position rCM as a practical and theoretically grounded framework for advancing large-scale diffusion distillation. Code is available at https://github.com/NVlabs/rcm.","short_abstract":"Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks remains unclear due to infrastructure challenges in Jacobian-vector product (JVP) computation and the...","url_abs":"https://arxiv.org/abs/2510.08431","url_pdf":"https://arxiv.org/pdf/2510.08431v3","authors":"[\"Kaiwen Zheng\",\"Yuji Wang\",\"Qianli Ma\",\"Huayu Chen\",\"Jintao Zhang\",\"Yogesh Balaji\",\"Jianfei Chen\",\"Ming-Yu Liu\",\"Jun Zhu\",\"Qinsheng Zhang\"]","published":"2025-10-09T16:45:30Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Generative Adversarial Network\"]","has_code":false,"code_links":[{"ID":608531,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2858283,"paper_url":"https://arxiv.org/abs/2510.08431","paper_title":"Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency","repo_url":"https://github.com/NVlabs/rcm","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
