{"ID":2891077,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.18791","arxiv_id":"2507.18791","title":"CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages","abstract":"Code-mixing, the practice of switching between languages within a conversation, poses unique challenges for traditional NLP. Existing benchmarks are limited by their narrow language pairs and tasks, failing to adequately assess large language models' (LLMs) code-mixing abilities. Despite the recognized importance of code-mixing for multilingual users, research on LLMs in this context remains sparse. Additionally, current techniques for synthesizing code-mixed data are underdeveloped to generate code-mixing. In response, we introduce CodeMixBench, a comprehensive benchmark covering eight tasks, including three specific to LLMs and five traditional NLP tasks, and 18 languages across seven language families. We also propose a new method for generating large-scale synthetic code-mixed texts by combining word substitution with GPT-4 prompting. Our evaluation reveals consistent underperformance of LLMs on code-mixed datasets involving different language families. Enhancements in training data size, model scale, and few-shot learning could improve their performance. The code and dataset are available at https://github.com/Jeromeyluck/CodeMixBench.","short_abstract":"Code-mixing, the practice of switching between languages within a conversation, poses unique challenges for traditional NLP. Existing benchmarks are limited by their narrow language pairs and tasks, failing to adequately assess large language models' (LLMs) code-mixing abilities. Despite the recognized importance of co...","url_abs":"https://arxiv.org/abs/2507.18791","url_pdf":"https://arxiv.org/pdf/2507.18791v2","authors":"[\"Yilun Yang\",\"Yekun Chai\"]","published":"2025-07-24T20:24:33Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611845,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2891077,"paper_url":"https://arxiv.org/abs/2507.18791","paper_title":"CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages","repo_url":"https://github.com/Jeromeyluck/CodeMixBench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}