{"ID":2893990,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.12260","arxiv_id":"2507.12260","title":"Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese","abstract":"Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure for translationese -- the translationese-index (T-index), computed from the likelihood ratios of two contrastively fine-tuned language models (LMs). We use synthesized translations and translations in the wild to evaluate T-index's generalizability in cross-domain settings and its validity against human judgments. Our results show that T-index can generalize to unseen genres, authors, and language pairs. Moreover, T-index computed using two 0.5B LMs fine-tuned on only 1-5k pairs of synthetic data can effectively capture translationese, as demonstrated by alignment with human pointwise ratings and pairwise judgments. Additionally, the correlation between T-index and existing machine translation (MT) quality estimation (QE) metrics such as BLEU and COMET is low, suggesting that T-index is not covered by these metrics and can serve as a complementary metric in MT QE.","short_abstract":"Translationese refers to linguistic properties that usually occur in translated texts. Previous works study translationese by framing it as a binary classification between original texts and translated texts. In this paper, we argue that translationese should be graded instead of binary and propose the first measure fo...","url_abs":"https://arxiv.org/abs/2507.12260","url_pdf":"https://arxiv.org/pdf/2507.12260v2","authors":"[\"Yikang Liu\",\"Wanyang Zhang\",\"Yiming Wang\",\"Jialong Tang\",\"Pei Zhang\",\"Baosong Yang\",\"Fei Huang\",\"Rui Wang\",\"Hai Hu\"]","published":"2025-07-16T14:06:05Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
