{"ID":2886438,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03560","arxiv_id":"2508.03560","title":"LaTCoder: Converting Webpage Design to Code with Layout-as-Thought","abstract":"Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they often fail to accurately preserve the layout during code generation. To this end, we draw inspiration from the Chain-of-Thought (CoT) reasoning in human cognition and propose LaTCoder, a novel approach that enhances layout preservation in webpage design during code generation with Layout-as-Thought (LaT). Specifically, we first introduce a simple yet efficient algorithm to divide the webpage design into image blocks. Next, we prompt MLLMs using a CoTbased approach to generate code for each block. Finally, we apply two assembly strategies-absolute positioning and an MLLM-based method-followed by dynamic selection to determine the optimal output. We evaluate the effectiveness of LaTCoder using multiple backbone MLLMs (i.e., DeepSeek-VL2, Gemini, and GPT-4o) on both a public benchmark and a newly introduced, more challenging benchmark (CC-HARD) that features complex layouts. The experimental results on automatic metrics demonstrate significant improvements. Specifically, TreeBLEU scores increased by 66.67% and MAE decreased by 38% when using DeepSeek-VL2, compared to direct prompting. Moreover, the human preference evaluation results indicate that annotators favor the webpages generated by LaTCoder in over 60% of cases, providing strong evidence of the effectiveness of our method.","short_abstract":"Converting webpage designs into code (design-to-code) plays a vital role in User Interface (UI) development for front-end developers, bridging the gap between visual design and functional implementation. While recent Multimodal Large Language Models (MLLMs) have shown significant potential in design-to-code tasks, they...","url_abs":"https://arxiv.org/abs/2508.03560","url_pdf":"https://arxiv.org/pdf/2508.03560v1","authors":"[\"Yi Gui\",\"Zhen Li\",\"Zhongyi Zhang\",\"Guohao Wang\",\"Tianpeng Lv\",\"Gaoyang Jiang\",\"Yi Liu\",\"Dongping Chen\",\"Yao Wan\",\"Hongyu Zhang\",\"Wenbin Jiang\",\"Xuanhua Shi\",\"Hai Jin\"]","published":"2025-08-05T15:28:48Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
