{"ID":2891468,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17801","arxiv_id":"2507.17801","title":"Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling","abstract":"We present Lumina-mGPT 2.0, a stand-alone, decoder-only autoregressive model that revisits and revitalizes the autoregressive paradigm for high-quality image generation and beyond. Unlike existing approaches that rely on pretrained components or hybrid architectures, Lumina-mGPT 2.0 is trained entirely from scratch, enabling unrestricted architectural design and licensing freedom. It achieves generation quality on par with state-of-the-art diffusion models such as DALL-E 3 and SANA, while preserving the inherent flexibility and compositionality of autoregressive modeling. Our unified tokenization scheme allows the model to seamlessly handle a wide spectrum of tasks-including subject-driven generation, image editing, controllable synthesis, and dense prediction-within a single generative framework. To further boost usability, we incorporate efficient decoding strategies like inference-time scaling and speculative Jacobi sampling to improve quality and speed, respectively. Extensive evaluations on standard text-to-image benchmarks (e.g., GenEval, DPG) demonstrate that Lumina-mGPT 2.0 not only matches but in some cases surpasses diffusion-based models. Moreover, we confirm its multi-task capabilities on the Graph200K benchmark, with the native Lumina-mGPT 2.0 performing exceptionally well. These results position Lumina-mGPT 2.0 as a strong, flexible foundation model for unified multimodal generation. We have released our training details, code, and models at https://github.com/Alpha-VLLM/Lumina-mGPT-2.0.","short_abstract":"We present Lumina-mGPT 2.0, a stand-alone, decoder-only autoregressive model that revisits and revitalizes the autoregressive paradigm for high-quality image generation and beyond. Unlike existing approaches that rely on pretrained components or hybrid architectures, Lumina-mGPT 2.0 is trained entirely from scratch, en...","url_abs":"https://arxiv.org/abs/2507.17801","url_pdf":"https://arxiv.org/pdf/2507.17801v1","authors":"[\"Yi Xin\",\"Juncheng Yan\",\"Qi Qin\",\"Zhen Li\",\"Dongyang Liu\",\"Shicheng Li\",\"Victor Shea-Jay Huang\",\"Yupeng Zhou\",\"Renrui Zhang\",\"Le Zhuo\",\"Tiancheng Han\",\"Xiaoqing Sun\",\"Siqi Luo\",\"Mengmeng Wang\",\"Bin Fu\",\"Yuewen Cao\",\"Hongsheng Li\",\"Guangtao Zhai\",\"Xiaohong Liu\",\"Yu Qiao\",\"Peng Gao\"]","published":"2025-07-23T17:42:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":611883,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2891468,"paper_url":"https://arxiv.org/abs/2507.17801","paper_title":"Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling","repo_url":"https://github.com/Alpha-VLLM/Lumina-mGPT-2.0","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
