{"ID":2883895,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.08227","arxiv_id":"2508.08227","title":"OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution","abstract":"Denoising Diffusion Probabilistic Models (DDPMs) show promising potential in one-step Real-World Image Super-Resolution (Real-ISR). Current one-step Real-ISR methods typically inject the low-quality (LQ) image latent representation at the start or end timestep of the DDPM scheduler. Recent studies have begun to note that the LQ image latent and the pre-trained noisy latent representations are intuitively closer at a mid-timestep. However, a quantitative analysis of these latent representations remains lacking. Considering these latent representations can be decomposed into signal and noise, we propose a method based on the Signal-to-Noise Ratio (SNR) to pre-compute an average optimal mid-timestep for injection. To better approximate the pre-trained noisy latent representation, we further introduce the Latent Representation Refinement (LRR) loss via a LoRA-enhanced VAE encoder. We also fine-tune the backbone of the DDPM-based generative model using LoRA to perform one-step denoising at the average optimal mid-timestep. Based on these components, we present OMGSR, a GAN-based Real-ISR framework that employs a DDPM-based generative model as the generator and a DINOv3-ConvNeXt model with multi-level discriminator heads as the discriminator. We also propose the DINOv3-ConvNeXt DISTS (Dv3CD) loss, which is enhanced for structural perception at varying resolutions. Within the OMGSR framework, we develop OMGSR-S based on SD2.1-base. An ablation study confirms that our pre-computation strategy and LRR loss significantly improve the baseline. Comparative studies demonstrate that OMGSR-S achieves state-of-the-art performance across multiple metrics. Code is available at \\hyperlink{Github}{https://github.com/wuer5/OMGSR}.","short_abstract":"Denoising Diffusion Probabilistic Models (DDPMs) show promising potential in one-step Real-World Image Super-Resolution (Real-ISR). Current one-step Real-ISR methods typically inject the low-quality (LQ) image latent representation at the start or end timestep of the DDPM scheduler. Recent studies have begun to note th...","url_abs":"https://arxiv.org/abs/2508.08227","url_pdf":"https://arxiv.org/pdf/2508.08227v2","authors":"[\"Zhiqiang Wu\",\"Zhaomang Sun\",\"Tong Zhou\",\"Bingtao Fu\",\"Ji Cong\",\"Yitong Dong\",\"Huaqi Zhang\",\"Xuan Tang\",\"Mingsong Chen\",\"Xian Wei\"]","published":"2025-08-11T17:44:59Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Diffusion Model\",\"LoRA\",\"Generative Adversarial Network\",\"Variational Autoencoder\"]","has_code":false,"code_links":[{"ID":611034,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2883895,"paper_url":"https://arxiv.org/abs/2508.08227","paper_title":"OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution","repo_url":"https://github.com/wuer5/OMGSR","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}