{"ID":3083780,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:38:11.424509713Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06066","arxiv_id":"2606.06066","title":"FontFusion: Enhancing Generative Text in Diffusion Models with Typographic Conditioning","abstract":"Typography generation in diffusion models faces a persistent trade-off: enabling precise font control typically degrades text legibility, while maintaining readability often sacrifices typographic fidelity. We present FontFusion, a plug-and-play conditioning framework for Diffusion Transformer (DiT) architectures that resolves this dilemma through three core innovations: (1) a hierarchical token representation establishing explicit text-font relationships at multiple granularities, (2) position-aware embeddings creating spatial bindings between typography and image content, and (3) a multi-level token dropping strategy improving both computational efficiency and generalization to unseen fonts. Our systematic evaluation of font embedding spaces reveals that a dual encoder combining DeepFont and DINOv2 outperforms any single encoder for typography tasks. FontFusion demonstrates 76% relative improvement on challenging decorative fonts over single-encoder baselines and font consistency gains exceeding approximately 68-76% over unconditioned models, while integrating into existing DiT architectures without retraining.","short_abstract":"Typography generation in diffusion models faces a persistent trade-off: enabling precise font control typically degrades text legibility, while maintaining readability often sacrifices typographic fidelity. We present FontFusion, a plug-and-play conditioning framework for Diffusion Transformer (DiT) architectures that...","url_abs":"https://arxiv.org/abs/2606.06066","url_pdf":"https://arxiv.org/pdf/2606.06066v1","authors":"[\"Marian Lupascu\",\"Nipun Jindal\",\"Ionut Mironica\",\"Zhaowen Wang\"]","published":"2026-06-04T12:07:12Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.GR\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
