{"ID":2890736,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.18064","arxiv_id":"2507.18064","title":"Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement","abstract":"Most existing low-light image enhancement (LLIE) methods rely on pre-trained model priors, low-light inputs, or both, while neglecting the semantic guidance available from normal-light images. This limitation hinders their effectiveness in complex lighting conditions. In this paper, we propose VLM-IMI, a framework that adapts large vision-language models with iterative and manual instructions for generative LLIE. VLM-IMI mainly contains two branches: Normal-Light Instruction Prior Generation (NL-IPG) and Instruction-aware Light Enhancement Diffusion (IA-LED). The NL-IPG incorporates textual descriptions of the desired normal-light content as enhancement cues, enabling semantically informed restoration. IA-LED incorporates instruction priors from the NL-IPG to guide the diffusion process, enabling precise illumination enhancement. To effectively integrate cross-modal priors, we introduce a learnable instruction prior fusion module, which dynamically aligns and fuses image and text features, promoting the generation of detailed and semantically coherent outputs. During inference, as the ground-truth normal-light images are not available, we propose an inference with an iterative instructions strategy to refine textual instructions, progressively improving visual quality. Our VLM-IMI also inherently supports manual instruction control by allowing users to directly input custom instructions into the LLM to generate user-expected outputs. Experiments across diverse scenarios demonstrate that VLM-IMI outperforms SOTA methods in terms of perception and realism. The source code is available at: https://github.com/sunxiaoran01/VLM-IMI.","short_abstract":"Most existing low-light image enhancement (LLIE) methods rely on pre-trained model priors, low-light inputs, or both, while neglecting the semantic guidance available from normal-light images. This limitation hinders their effectiveness in complex lighting conditions. In this paper, we propose VLM-IMI, a framework that...","url_abs":"https://arxiv.org/abs/2507.18064","url_pdf":"https://arxiv.org/pdf/2507.18064v2","authors":"[\"Xiaoran Sun\",\"Liyan Wang\",\"Yeying Jin\",\"Kin-man Lam\",\"Zhixun Su\",\"Yang Yang\",\"Jinshan Pan\",\"Cong Wang\"]","published":"2025-07-24T03:35:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611806,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2890736,"paper_url":"https://arxiv.org/abs/2507.18064","paper_title":"Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement","repo_url":"https://github.com/sunxiaoran01/VLM-IMI","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
