{"ID":2879392,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16265","arxiv_id":"2508.16265","title":"M3TQA: Massively Multilingual Multitask Table Question Answering","abstract":"Tabular data is a fundamental component of real-world information systems, yet most research in table understanding remains confined to English, leaving multilingual comprehension significantly underexplored. Existing multilingual table benchmarks suffer from geolinguistic imbalance - overrepresenting certain languages and lacking sufficient scale for rigorous cross-lingual analysis. To address these limitations, we introduce a comprehensive framework for massively multilingual multitask table question answering, featuring m3TQA-Instruct, a large-scale benchmark spanning 97 languages across diverse language families, including underrepresented and low-resource languages. We construct m3TQA by curating 50 real-world tables in Chinese and English, then applying a robust six-step LLM-based translation pipeline powered by DeepSeek and GPT-4o, achieving high translation fidelity with a median BLEU score of 60.19 as validated through back-translation. The benchmark includes 2,916 professionally annotated question-answering pairs across four tasks designed to evaluate nuanced table reasoning capabilities. Experiments on state-of-the-art LLMs reveal critical insights into cross-lingual generalization, demonstrating that synthetically generated, unannotated QA data can significantly boost performance, particularly for low-resource languages. M3T-Bench establishes a new standard for multilingual table understanding, providing both a challenging evaluation platform and a scalable methodology for future research.","short_abstract":"Tabular data is a fundamental component of real-world information systems, yet most research in table understanding remains confined to English, leaving multilingual comprehension significantly underexplored. Existing multilingual table benchmarks suffer from geolinguistic imbalance - overrepresenting certain languages...","url_abs":"https://arxiv.org/abs/2508.16265","url_pdf":"https://arxiv.org/pdf/2508.16265v1","authors":"[\"Daixin Shu\",\"Jian Yang\",\"Zhenhe Wu\",\"Xianjie Wu\",\"Xianfu Cheng\",\"Xiangyuan Guan\",\"Yanghai Wang\",\"Pengfei Wu\",\"Tingyang Yang\",\"Hualei Zhu\",\"Wei Zhang\",\"Ge Zhang\",\"Jiaheng Liu\",\"Zhoujun Li\"]","published":"2025-08-22T09:57:40Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\"]","has_code":false}
