{"ID":2836100,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22699","arxiv_id":"2511.22699","title":"Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer","abstract":"The landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including Qwen-Image, Hunyuan-Image-3.0 and FLUX.2, are characterized by massive parameter counts (20B to 80B), making them impractical for inference, and fine-tuning on consumer-grade hardware. To address this gap, we propose Z-Image, an efficient 6B-parameter foundation generative model built upon a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture that challenges the \"scale-at-all-costs\" paradigm. By systematically optimizing the entire model lifecycle -- from a curated data infrastructure to a streamlined training curriculum -- we complete the full training workflow in just 314K H800 GPU hours (approx. $630K). Our few-step distillation scheme with reward post-training further yields Z-Image-Turbo, offering both sub-second inference latency on an enterprise-grade H800 GPU and compatibility with consumer-grade hardware (\u003c16GB VRAM). Additionally, our omni-pre-training paradigm also enables efficient training of Z-Image-Edit, an editing model with impressive instruction-following capabilities. Both qualitative and quantitative experiments demonstrate that our model achieves performance comparable to or surpassing that of leading competitors across various dimensions. Most notably, Z-Image exhibits exceptional capabilities in photorealistic image generation and bilingual text rendering, delivering results that rival top-tier commercial models, thereby demonstrating that state-of-the-art results are achievable with significantly reduced computational overhead. We publicly release our code, weights, and online demo to foster the development of accessible, budget-friendly, yet state-of-the-art generative models.","short_abstract":"The landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including Qwen-Image, Hunyuan-Image-3.0 and FLUX.2, are characterized by massive parameter counts (20B to 80B), making them impractical for...","url_abs":"https://arxiv.org/abs/2511.22699","url_pdf":"https://arxiv.org/pdf/2511.22699v3","authors":"[\"Z-Image Team\",\"Huanqia Cai\",\"Sihan Cao\",\"Ruoyi Du\",\"Peng Gao\",\"Steven Hoi\",\"Zhaohui Hou\",\"Shijie Huang\",\"Dengyang Jiang\",\"Xin Jin\",\"Liangchen Li\",\"Zhen Li\",\"Zhong-Yu Li\",\"David Liu\",\"Dongyang Liu\",\"Junhan Shi\",\"Qilong Wu\",\"Feng Yu\",\"Chi Zhang\",\"Shifeng Zhang\",\"Shilin Zhou\"]","published":"2025-11-27T18:52:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
