{"ID":2849247,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24664","arxiv_id":"2510.24664","title":"MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation","abstract":"Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains are not lost in evaluation noise. To this end, we experiment with a two-stage version of the current state-of-the-art translation evaluation paradigm (MQM), which we call MQM re-annotation. In this setup, an MQM annotator reviews and edits a set of pre-existing MQM annotations, that may have come from themselves, another human annotator, or an automatic MQM annotation system. We demonstrate that rater behavior in re-annotation aligns with our goals, and that re-annotation results in higher-quality annotations, mostly due to finding errors that were missed during the first pass.","short_abstract":"Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains are not lost in evaluation noise. To this end, we experiment with a two-stage version of the current state-of-the-art translation e...","url_abs":"https://arxiv.org/abs/2510.24664","url_pdf":"https://arxiv.org/pdf/2510.24664v1","authors":"[\"Parker Riley\",\"Daniel Deutsch\",\"Mara Finkelstein\",\"Colten DiIanni\",\"Juraj Juraska\",\"Markus Freitag\"]","published":"2025-10-28T17:29:59Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}