{"ID":2856352,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.11391","arxiv_id":"2510.11391","title":"DocReward: A Document Reward Model for Structuring and Stylizing","abstract":"Recent agentic workflows automate professional document generation but focus narrowly on textual quality, overlooking structural and stylistic professionalism, which is equally critical for readability. This gap stems mainly from a lack of effective reward models capable of guiding agents toward producing documents with high structural and stylistic professionalism. We introduce DocReward, a document reward model that evaluates documents based on their structure and style. To achieve this, we propose a textual-quality-agnostic framework that ensures assessments are not confounded by content quality, and construct DocPair, a dataset of 117K paired documents covering 32 domains and 267 types. Each pair shares identical content but differs in structural and stylistic professionalism. DocReward is trained using the Bradley-Terry loss. On a manually annotated benchmark, DocReward outperforms GPT-5 by 14.6 percentage points in the same setting. Reinforcement learning experiments further show that DocReward effectively guides agents toward generating documents with consistently higher structural and stylistic professionalism, highlighting its practical utility.","short_abstract":"Recent agentic workflows automate professional document generation but focus narrowly on textual quality, overlooking structural and stylistic professionalism, which is equally critical for readability. This gap stems mainly from a lack of effective reward models capable of guiding agents toward producing documents wit...","url_abs":"https://arxiv.org/abs/2510.11391","url_pdf":"https://arxiv.org/pdf/2510.11391v3","authors":"[\"Junpeng Liu\",\"Yuzhong Zhao\",\"Bowen Cao\",\"Jiayu Ding\",\"Yilin Jia\",\"Tengchao Lv\",\"Yupan Huang\",\"Wenshan Wu\",\"Shaohan Huang\",\"Nan Yang\",\"Li Dong\",\"Lei Cui\",\"Tao Ge\",\"Xun Wang\",\"Huitian Jiao\",\"Sun Mao\",\"FNU Kartik\",\"Si-Qing Chen\",\"Wai Lam\",\"Furu Wei\"]","published":"2025-10-13T13:36:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
