{"ID":2885285,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.05468","arxiv_id":"2508.05468","title":"TASE: Token Awareness and Structured Evaluation for Multilingual Language Models","abstract":"While large language models (LLMs) have demonstrated remarkable performance on high-level semantic tasks, they often struggle with fine-grained, token-level understanding and structural reasoning--capabilities that are essential for applications requiring precision and control. We introduce TASE, a comprehensive benchmark designed to evaluate LLMs' ability to perceive and reason about token-level information across languages. TASE covers 10 tasks under two core categories: token awareness and structural understanding, spanning Chinese, English, and Korean, with a 35,927-instance evaluation set and a scalable synthetic data generation pipeline for training. Tasks include character counting, token alignment, syntactic structure parsing, and length constraint satisfaction. We evaluate over 30 leading commercial and open-source LLMs, including O3, Claude 4, Gemini 2.5 Pro, and DeepSeek-R1, and train a custom Qwen2.5-14B model using the GRPO training method. Results show that human performance significantly outpaces current LLMs, revealing persistent weaknesses in token-level reasoning. TASE sheds light on these limitations and provides a new diagnostic lens for future improvements in low-level language understanding and cross-lingual generalization. Our code and dataset are publicly available at https://github.com/cyzcz/Tase .","short_abstract":"While large language models (LLMs) have demonstrated remarkable performance on high-level semantic tasks, they often struggle with fine-grained, token-level understanding and structural reasoning--capabilities that are essential for applications requiring precision and control. We introduce TASE, a comprehensive benchm...","url_abs":"https://arxiv.org/abs/2508.05468","url_pdf":"https://arxiv.org/pdf/2508.05468v1","authors":"[\"Chenzhuo Zhao\",\"Xinda Wang\",\"Yue Huang\",\"Junting Lu\",\"Ziqian Liu\"]","published":"2025-08-07T15:11:17Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":611174,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2885285,"paper_url":"https://arxiv.org/abs/2508.05468","paper_title":"TASE: Token Awareness and Structured Evaluation for Multilingual Language Models","repo_url":"https://github.com/cyzcz/Tase","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
