{"ID":2846386,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.02958","arxiv_id":"2511.02958","title":"Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model","abstract":"Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.","short_abstract":"Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade...","url_abs":"https://arxiv.org/abs/2511.02958","url_pdf":"https://arxiv.org/pdf/2511.02958v1","authors":"[\"Cristian García-Romero\",\"Miquel Esplà-Gomis\",\"Felipe Sánchez-Martínez\"]","published":"2025-11-04T19:59:25Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[]","has_code":false}
