{"ID":2841520,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10984","arxiv_id":"2511.10984","title":"DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains","abstract":"The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus on segment-level accuracy and fluency. To address this limitation, we introduce DiscoX, a new benchmark for discourse-level and expert-level Chinese-English translation. It comprises 200 professionally-curated texts from 7 domains, with an average length exceeding 1700 tokens. To evaluate performance on DiscoX, we also develop Metric-S, a reference-free system that provides fine-grained automatic assessments across accuracy, fluency, and appropriateness. Metric-S demonstrates strong consistency with human judgments, significantly outperforming existing metrics. Our experiments reveal a remarkable performance gap: even the most advanced LLMs still trail human experts on these tasks. This finding validates the difficulty of DiscoX and underscores the challenges that remain in achieving professional-grade machine translation. The proposed benchmark and evaluation system provide a robust framework for more rigorous evaluation, facilitating future advancements in LLM-based translation.","short_abstract":"The evaluation of discourse-level translation in expert domains remains inadequate, despite its centrality to knowledge dissemination and cross-lingual scholarly communication. While these translations demand discourse-level coherence and strict terminological precision, current evaluation methods predominantly focus o...","url_abs":"https://arxiv.org/abs/2511.10984","url_pdf":"https://arxiv.org/pdf/2511.10984v2","authors":"[\"Xiying Zhao\",\"Zhoufutu Wen\",\"Zhixuan Chen\",\"Jingzhe Ding\",\"Jianpeng Jiao\",\"Shuai Li\",\"Xi Li\",\"Danni Liang\",\"Shengda Long\",\"Qianqian Liu\",\"Xianbo Wu\",\"Hongwan Gao\",\"Xiang Gao\",\"Liang Hu\",\"Jiashuo Liu\",\"Mengyun Liu\",\"Weiran Shi\",\"Chenghao Yang\",\"Qianyu Yang\",\"Xuanliang Zhang\",\"Ge Zhang\",\"Wenhao Huang\",\"Yuwen Tang\"]","published":"2025-11-14T06:09:37Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}