{"ID":2849277,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24707","arxiv_id":"2510.24707","title":"MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task","abstract":"In this paper, we present our submissions to the unified WMT25 Translation Evaluation Shared Task. For the Quality Score Prediction subtask, we create a new generation of MetricX with improvements in the input format and the training protocol, while for the Error Span Detection subtask we develop a new model, GemSpanEval, trained to predict error spans along with their severities and categories. Both systems are based on the state-of-the-art multilingual open-weights model Gemma 3, fine-tuned on publicly available WMT data. We demonstrate that MetricX-25, adapting Gemma 3 to an encoder-only architecture with a regression head on top, can be trained to effectively predict both MQM and ESA quality scores, and significantly outperforms its predecessor. Our decoder-only GemSpanEval model, on the other hand, we show to be competitive in error span detection with xCOMET, a strong encoder-only sequence-tagging baseline. With error span detection formulated as a generative task, we instruct the model to also output the context for each predicted error span, thus ensuring that error spans are identified unambiguously.","short_abstract":"In this paper, we present our submissions to the unified WMT25 Translation Evaluation Shared Task. For the Quality Score Prediction subtask, we create a new generation of MetricX with improvements in the input format and the training protocol, while for the Error Span Detection subtask we develop a new model, GemSpanEv...","url_abs":"https://arxiv.org/abs/2510.24707","url_pdf":"https://arxiv.org/pdf/2510.24707v1","authors":"[\"Juraj Juraska\",\"Tobias Domhan\",\"Mara Finkelstein\",\"Tetsuji Nakagawa\",\"Geza Kovacs\",\"Daniel Deutsch\",\"Pidong Wang\",\"Markus Freitag\"]","published":"2025-10-28T17:56:20Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
