{"ID":2862226,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01146","arxiv_id":"2510.01146","title":"mR3: Multilingual Rubric-Agnostic Reward Reasoning Models","abstract":"Evaluation using Large Language Model (LLM) judges has been widely adopted in English and shown to be effective for automatic evaluation. However, their performance does not generalize well to non-English settings, and it remains unclear what constitutes effective multilingual training for such judges. In this paper, we introduce mR3, a massively multilingual, rubric-agnostic reward reasoning model trained on 72 languages, achieving the broadest language coverage in reward modeling to date. We present a comprehensive study of data and curriculum selection for training to identify effective strategies and data sources for building high-quality reward models, including support for reasoning in the target language. Our approach attains state-of-the-art performance on multilingual reward model benchmarks, surpassing much larger models (i.e., GPT-OSS-120B) while being up to 9x smaller, and its effectiveness is further confirmed through extensive ablation studies. Finally, we demonstrate the effectiveness of mR3 in off-policy preference optimization and validate the quality of its reasoning traces and rubric-based evaluations through human studies with 20 annotators across 12 languages, where mR3 models' reasoning is preferred, including for extremely low-resource languages that are entirely unseen during training. Our models, data, and code are available as open source at https://github.com/rubricreward/mr3.","short_abstract":"Evaluation using Large Language Model (LLM) judges has been widely adopted in English and shown to be effective for automatic evaluation. However, their performance does not generalize well to non-English settings, and it remains unclear what constitutes effective multilingual training for such judges. In this paper, w...","url_abs":"https://arxiv.org/abs/2510.01146","url_pdf":"https://arxiv.org/pdf/2510.01146v2","authors":"[\"David Anugraha\",\"Shou-Yi Hung\",\"Zilu Tang\",\"Annie En-Shiun Lee\",\"Derry Tanti Wijaya\",\"Genta Indra Winata\"]","published":"2025-10-01T17:36:59Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608880,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862226,"paper_url":"https://arxiv.org/abs/2510.01146","paper_title":"mR3: Multilingual Rubric-Agnostic Reward Reasoning Models","repo_url":"https://github.com/rubricreward/mr3","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
