{"ID":2862122,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00948","arxiv_id":"2510.00948","title":"InfVSR: Toward Consistency-Driven Streaming Generative Video Super-Resolution","abstract":"Real-world videos often extend over thousands of frames. Existing generative video super-resolution (VSR) approaches, however, face two persistent challenges when processing long sequences: (1) inefficiency due to the heavy cost of multi-step denoising for full-length sequences; and (2) poor consistency is hindered by temporal decomposition that causes artifacts and discontinuities. To break these limits, we propose InfVSR, which reformulates VSR as an autoregressive-one-step-diffusion paradigm, and enables streaming inference with video diffusion priors. First, we adapt the pretrained DiT into a causal structure, maintaining both local and global coherence via rolling KV-cache and joint visual guidance. Second, we distill the diffusion process into a single step efficiently, with patch-wise pixel supervision and cross-chunk distribution matching. To fill the gap in long-form video evaluation, we build a new benchmark tailored for extended sequences and further introduce semantic-level metrics to comprehensively assess temporal consistency. Our method pushes the frontier of long-form VSR, achieves state-of-the-art quality with enhanced semantic consistency, and delivers up to 58x speed-up over existing methods such as MGLD-VSR. Our code and models are available at https://github.com/Kai-Liu001/InfVSR.","short_abstract":"Real-world videos often extend over thousands of frames. Existing generative video super-resolution (VSR) approaches, however, face two persistent challenges when processing long sequences: (1) inefficiency due to the heavy cost of multi-step denoising for full-length sequences; and (2) poor consistency is hindered by...","url_abs":"https://arxiv.org/abs/2510.00948","url_pdf":"https://arxiv.org/pdf/2510.00948v3","authors":"[\"Ziqing Zhang\",\"Kai Liu\",\"Zheng Chen\",\"Xi Li\",\"Yucong Chen\",\"Bingnan Duan\",\"Linghe Kong\",\"Yulun Zhang\"]","published":"2025-10-01T14:21:45Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":608874,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862122,"paper_url":"https://arxiv.org/abs/2510.00948","paper_title":"InfVSR: Toward Consistency-Driven Streaming Generative Video Super-Resolution","repo_url":"https://github.com/Kai-Liu001/InfVSR","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}