{"ID":2874539,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04032","arxiv_id":"2509.04032","title":"What if I ask in \\textit{alia lingua}? Measuring Functional Similarity Across Languages","abstract":"How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability grow. Interestingly, models exhibit greater cross-lingual consistency within themselves than agreement with other models prompted in the same language. These results highlight not only the value of $κ_p$ as a practical tool for evaluating multilingual reliability, but also its potential to guide the development of more consistent multilingual systems.","short_abstract":"How similar are model outputs across languages? In this work, we study this question using a recently proposed model similarity metric $κ_p$ applied to 20 languages and 47 subjects in GlobalMMLU. Our analysis reveals that a model's responses become increasingly consistent across languages as its size and capability gro...","url_abs":"https://arxiv.org/abs/2509.04032","url_pdf":"https://arxiv.org/pdf/2509.04032v2","authors":"[\"Debangan Mishra\",\"Arihant Rastogi\",\"Agyeya Negi\",\"Shashwat Goel\",\"Ponnurangam Kumaraguru\"]","published":"2025-09-04T09:08:39Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\"]","methods":"[]","has_code":false}