{"ID":2857836,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.07665","arxiv_id":"2510.07665","title":"Automatic Text Box Placement for Supporting Typographic Design","abstract":"In layout design for advertisements and web pages, balancing visual appeal and communication efficiency is crucial. This study examines automated text box placement in incomplete layouts, comparing a standard Transformer-based method, a small Vision and Language Model (Phi3.5-vision), a large pretrained VLM (Gemini), and an extended Transformer that processes multiple images. Evaluations on the Crello dataset show the standard Transformer-based models generally outperform VLM-based approaches, particularly when incorporating richer appearance information. However, all methods face challenges with very small text or densely populated layouts. These findings highlight the benefits of task-specific architectures and suggest avenues for further improvement in automated layout design.","short_abstract":"In layout design for advertisements and web pages, balancing visual appeal and communication efficiency is crucial. This study examines automated text box placement in incomplete layouts, comparing a standard Transformer-based method, a small Vision and Language Model (Phi3.5-vision), a large pretrained VLM (Gemini), a...","url_abs":"https://arxiv.org/abs/2510.07665","url_pdf":"https://arxiv.org/pdf/2510.07665v1","authors":"[\"Jun Muraoka\",\"Daichi Haraguchi\",\"Naoto Inoue\",\"Wataru Shimoda\",\"Kota Yamaguchi\",\"Seiichi Uchida\"]","published":"2025-10-09T01:38:21Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}