{"ID":2857021,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.13649","arxiv_id":"2510.13649","title":"Local-Global Context-Aware and Structure-Preserving Image Super-Resolution","abstract":"Diffusion models have recently achieved significant success in various image manipulation tasks, including image super-resolution and perceptual quality enhancement. Pretrained text-to-image models, such as Stable Diffusion, have exhibited strong capabilities in synthesizing realistic image content, which makes them particularly attractive for addressing super-resolution tasks. While some existing approaches leverage these models to achieve state-of-the-art results, they often struggle when applied to diverse and highly degraded images, leading to noise amplification or incorrect content generation. To address these limitations, we propose a contextually precise image super-resolution framework that effectively maintains both local and global pixel relationships through Local-Global Context-Aware Attention, enabling the generation of high-quality images. Furthermore, we propose a distribution- and perceptual-aligned conditioning mechanism in the pixel space to enhance perceptual fidelity. This mechanism captures fine-grained pixel-level representations while progressively preserving and refining structural information, transitioning from local content details to the global structural composition. During inference, our method generates high-quality images that are structurally consistent with the original content, mitigating artifacts and ensuring realistic detail restoration. Extensive experiments on multiple super-resolution benchmarks demonstrate the effectiveness of our approach in producing high-fidelity, perceptually accurate reconstructions.","short_abstract":"Diffusion models have recently achieved significant success in various image manipulation tasks, including image super-resolution and perceptual quality enhancement. Pretrained text-to-image models, such as Stable Diffusion, have exhibited strong capabilities in synthesizing realistic image content, which makes them pa...","url_abs":"https://arxiv.org/abs/2510.13649","url_pdf":"https://arxiv.org/pdf/2510.13649v1","authors":"[\"Sanchar Palit\",\"Subhasis Chaudhuri\",\"Biplab Banerjee\"]","published":"2025-10-11T07:17:31Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}