{"ID":2823061,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.01141","arxiv_id":"2601.01141","title":"YODA: Yet Another One-step Diffusion-based Video Compressor","abstract":"While one-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited. Prior efforts typically rely on pretrained 2D autoencoders that generate per-frame latent representations independently, thereby neglecting temporal dependencies. We present YODA--Yet Another One-step Diffusion-based Video Compressor--which embeds multiscale features from temporal references for both latent generation and latent coding to better exploit spatial-temporal correlations for more compact representation, and employs a linear Diffusion Transformer (DiT) for efficient one-step denoising. YODA achieves state-of-the-art perceptual performance, consistently outperforming traditional and deep-learning baselines on LPIPS, DISTS, FID, and KID. Source code will be publicly available at https://github.com/NJUVISION/YODA.","short_abstract":"While one-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited. Prior efforts typically rely on pretrained 2D autoencoders that generate per-frame latent representations independently, thereby neglecting temporal dependencies. We present YODA--Yet Anot...","url_abs":"https://arxiv.org/abs/2601.01141","url_pdf":"https://arxiv.org/pdf/2601.01141v1","authors":"[\"Xingchen Li\",\"Junzhe Zhang\",\"Junqi Shi\",\"Ming Lu\",\"Zhan Ma\"]","published":"2026-01-03T10:12:07Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false,"code_links":[{"ID":605467,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2823061,"paper_url":"https://arxiv.org/abs/2601.01141","paper_title":"YODA: Yet Another One-step Diffusion-based Video Compressor","repo_url":"https://github.com/NJUVISION/YODA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}