{"ID":2857665,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09536","arxiv_id":"2510.09536","title":"Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors","abstract":"Large language models (LLMs) are increasingly deployed in multilingual, real-world applications with user inputs -- naturally introducing \\emph{typographical errors} (typos). Yet most benchmarks assume clean input, leaving the robustness of LLMs to typos across languages largely underexplored. To address this gap, we introduce MulTypo, a multilingual typo generation algorithm that simulates human-like errors based on language-specific keyboard layouts and typing behavior. We evaluate 18 open-source LLMs across three model families and five downstream tasks spanning language inference, multi-choice question answering, mathematical reasoning, and machine translation tasks. Our results show that typos consistently degrade performance, particularly in generative tasks and those requiring reasoning -- while the natural language inference task is comparatively more robust. Instruction tuning improves clean-input performance but may increase brittleness under noise. We also observe language-dependent robustness: high-resource languages are generally more robust than low-resource ones, and translation from English is more robust than translation into English. Our findings underscore the need for noise-aware training and multilingual robustness evaluation. We release a Python package for MulTypo and make the source code publicly available at https://github.com/cisnlp/multypo.","short_abstract":"Large language models (LLMs) are increasingly deployed in multilingual, real-world applications with user inputs -- naturally introducing \\emph{typographical errors} (typos). Yet most benchmarks assume clean input, leaving the robustness of LLMs to typos across languages largely underexplored. To address this gap, we i...","url_abs":"https://arxiv.org/abs/2510.09536","url_pdf":"https://arxiv.org/pdf/2510.09536v3","authors":"[\"Raoyuan Zhao\",\"Yihong Liu\",\"Lena Altinger\",\"Hinrich Schütze\",\"Michael A. Hedderich\"]","published":"2025-10-10T16:49:12Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608473,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2857665,"paper_url":"https://arxiv.org/abs/2510.09536","paper_title":"Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors","repo_url":"https://github.com/cisnlp/multypo","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
