{"ID":2825892,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20479","arxiv_id":"2512.20479","title":"UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images","abstract":"AI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners, and advertisements. While diffusion-based text-to-image models have demonstrated strong capabilities in visual content generation, their text rendering performance, particularly for small-scale typography and non-Latin scripts, remains limited. In this paper, we propose UTDesign, a unified framework for high-precision stylized text editing and conditional text generation in design images, supporting both English and Chinese scripts. Our framework introduces a novel DiT-based text style transfer model trained from scratch on a synthetic dataset, capable of generating transparent RGBA text foregrounds that preserve the style of reference glyphs. We further extend this model into a conditional text generation framework by training a multi-modal condition encoder on a curated dataset with detailed text annotations, enabling accurate, style-consistent text synthesis conditioned on background images, prompts, and layout specifications. Finally, we integrate our approach into a fully automated text-to-design (T2D) pipeline by incorporating pre-trained text-to-image (T2I) models and an MLLM-based layout planner. Extensive experiments demonstrate that UTDesign achieves state-of-the-art performance among open-source methods in terms of stylistic consistency and text accuracy, and also exhibits unique advantages compared to proprietary commercial approaches. Code and data for this paper are available at https://github.com/ZYM-PKU/UTDesign.","short_abstract":"AI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners, and advertisements. While diffusion-based text-to-image models have demonstrated strong capabilities in visual content generation, their text rendering performance, particularly...","url_abs":"https://arxiv.org/abs/2512.20479","url_pdf":"https://arxiv.org/pdf/2512.20479v1","authors":"[\"Yiming Zhao\",\"Yuanpeng Gao\",\"Yuxuan Luo\",\"Jiwei Duan\",\"Shisong Lin\",\"Longfei Xiong\",\"Zhouhui Lian\"]","published":"2025-12-23T16:13:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605706,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825892,"paper_url":"https://arxiv.org/abs/2512.20479","paper_title":"UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images","repo_url":"https://github.com/ZYM-PKU/UTDesign","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
