{"ID":2831343,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.08897","arxiv_id":"2512.08897","title":"UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation","abstract":"Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background image. The variety of real-world applications makes it highly challenging to develop a single model capable of unifying the diverse range of input-constrained generation sub-tasks, such as those conditioned by element types, sizes, or their relationships. Current methods either address only a subset of these tasks or necessitate separate model parameters for different conditions, failing to offer a truly unified solution. In this paper, we propose UniLayDiff: a Unified Diffusion Transformer, that for the first time, addresses various content-aware layout generation tasks with a single, end-to-end trainable model. Specifically, we treat layout constraints as a distinct modality and employ Multi-Modal Diffusion Transformer framework to capture the complex interplay between the background image, layout elements, and diverse constraints. Moreover, we integrate relation constraints through fine-tuning the model with LoRA after pretraining the model on other tasks. Such a schema not only achieves unified conditional generation but also enhances overall layout quality. Extensive experiments demonstrate that UniLayDiff achieves state-of-the-art performance across from unconditional to various conditional generation tasks and, to the best of our knowledge, is the first model to unify the full range of content-aware layout generation tasks.","short_abstract":"Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background image. The variety of real-world applications makes it highly challenging to develop a single model capable of unifying the diver...","url_abs":"https://arxiv.org/abs/2512.08897","url_pdf":"https://arxiv.org/pdf/2512.08897v1","authors":"[\"Zeyang Liu\",\"Le Wang\",\"Sanping Zhou\",\"Yuxuan Wu\",\"Xiaolong Sun\",\"Gang Hua\",\"Haoxiang Li\"]","published":"2025-12-09T18:38:44Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\",\"LoRA\"]","has_code":false}