{"ID":2889995,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.20388","arxiv_id":"2507.20388","title":"ModalFormer: Multimodal Transformer for Low-Light Image Enhancement","abstract":"Low-light image enhancement (LLIE) is a fundamental yet challenging task due to the presence of noise, loss of detail, and poor contrast in images captured under insufficient lighting conditions. Recent methods often rely solely on pixel-level transformations of RGB images, neglecting the rich contextual information available from multiple visual modalities. In this paper, we present ModalFormer, the first large-scale multimodal framework for LLIE that fully exploits nine auxiliary modalities to achieve state-of-the-art performance. Our model comprises two main components: a Cross-modal Transformer (CM-T) designed to restore corrupted images while seamlessly integrating multimodal information, and multiple auxiliary subnetworks dedicated to multimodal feature reconstruction. Central to the CM-T is our novel Cross-modal Multi-headed Self-Attention mechanism (CM-MSA), which effectively fuses RGB data with modality-specific features--including deep feature embeddings, segmentation information, geometric cues, and color information--to generate information-rich hybrid attention maps. Extensive experiments on multiple benchmark datasets demonstrate ModalFormer's state-of-the-art performance in LLIE. Pre-trained models and results are made available at https://github.com/albrateanu/ModalFormer.","short_abstract":"Low-light image enhancement (LLIE) is a fundamental yet challenging task due to the presence of noise, loss of detail, and poor contrast in images captured under insufficient lighting conditions. Recent methods often rely solely on pixel-level transformations of RGB images, neglecting the rich contextual information av...","url_abs":"https://arxiv.org/abs/2507.20388","url_pdf":"https://arxiv.org/pdf/2507.20388v1","authors":"[\"Alexandru Brateanu\",\"Raul Balmez\",\"Ciprian Orhei\",\"Codruta Ancuti\",\"Cosmin Ancuti\"]","published":"2025-07-27T19:07:22Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":611729,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2889995,"paper_url":"https://arxiv.org/abs/2507.20388","paper_title":"ModalFormer: Multimodal Transformer for Low-Light Image Enhancement","repo_url":"https://github.com/albrateanu/ModalFormer","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}