{"ID":2829934,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.11749","arxiv_id":"2512.11749","title":"SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder","abstract":"Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale text-to-image diffusion models entirely within the VFM representation space remains largely unexplored. To bridge this gap, we scale the SVG (Self-supervised representations for Visual Generation) framework, proposing SVG-T2I to support high-quality text-to-image synthesis directly in the VFM feature domain. By leveraging a standard text-to-image diffusion pipeline, SVG-T2I achieves competitive performance, reaching 0.75 on GenEval and 85.78 on DPG-Bench. This performance validates the intrinsic representational power of VFMs for generative tasks. We fully open-source the project, including the autoencoder and generation model, together with their training, inference, evaluation pipelines, and pre-trained weights, to facilitate further research in representation-driven visual generation.","short_abstract":"Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despite this potential, training large-scale text-to-image diffusion models entirely within the VFM representation space remains largely...","url_abs":"https://arxiv.org/abs/2512.11749","url_pdf":"https://arxiv.org/pdf/2512.11749v1","authors":"[\"Minglei Shi\",\"Haolin Wang\",\"Borui Zhang\",\"Wenzhao Zheng\",\"Bohan Zeng\",\"Ziyang Yuan\",\"Xiaoshi Wu\",\"Yuanxing Zhang\",\"Huan Yang\",\"Xintao Wang\",\"Pengfei Wan\",\"Kun Gai\",\"Jie Zhou\",\"Jiwen Lu\"]","published":"2025-12-12T17:45:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
