{"ID":2921127,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-04T06:21:04.369492701Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01800","arxiv_id":"2606.01800","title":"Multilinguality of Large Language Models From a Structural Perspective","abstract":"Large language models (LLMs) have excelled in processing multiple languages through pre- and post-training on multilingual data, even though English dominates the training data. Prior work focusing on token representations has revealed how those LLMs process non-English text. Although these analyses have provided insightful findings, they fail to capture a structural view, which is an inherent property of language. In this study, we explore the multilinguality of LLMs through representational structural analysis. Our findings reveal that low-resource languages are structurally more different from English than high- and mid-resource languages, and that language-specific post-training alters their structures while preserving inter-language relationships.","short_abstract":"Large language models (LLMs) have excelled in processing multiple languages through pre- and post-training on multilingual data, even though English dominates the training data. Prior work focusing on token representations has revealed how those LLMs process non-English text. Although these analyses have provided insig...","url_abs":"https://arxiv.org/abs/2606.01800","url_pdf":"https://arxiv.org/pdf/2606.01800v1","authors":"[\"Haruki Sakajo\",\"Yusuke Sakai\",\"Hidetaka Kamigaito\",\"Taro Watanabe\"]","published":"2026-06-01T07:18:09Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
