{"ID":2865430,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22461","arxiv_id":"2509.22461","title":"CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges","abstract":"The ability to reason from audio, including speech, environmental sounds, and music, is essential for AI agents to interact effectively in real-world scenarios. Existing benchmarks mainly focus on static or single-scene settings and English audio data and do not fully capture scenarios where multiple speakers, unfolding events, and heterogeneous audio sources interact. To address these challenges, we introduce CMDAR, a Chinese benchmark for evaluating models on complex, multi-scene, and dynamically evolving audio reasoning tasks. CMDAR comprises 3,000 carefully curated question-answer pairs linked to diverse audio clips, covering five categories of complex reasoning and spanning three question types. We benchmark 26 state-of-the-art audio language models on CMDAR and observe that they exhibit limitations in complex reasoning tasks. In CMDAR-main, Qwen2.5-Omni achieves 76.67% accuracy, whereas GPT-4o Audio reaches 68.47%. However, GPT-4o Audio substantially outperforms Qwen2.5-Omni on the more challenging multiple-choice with multiple audios and open-ended tasks. And we provide detail analysis corresponding suggestions for the future development of large audio language models.","short_abstract":"The ability to reason from audio, including speech, environmental sounds, and music, is essential for AI agents to interact effectively in real-world scenarios. Existing benchmarks mainly focus on static or single-scene settings and English audio data and do not fully capture scenarios where multiple speakers, unfoldin...","url_abs":"https://arxiv.org/abs/2509.22461","url_pdf":"https://arxiv.org/pdf/2509.22461v3","authors":"[\"Hui Li\",\"Changhao Jiang\",\"Hongyu Wang\",\"Ming Zhang\",\"Jiajun Sun\",\"Zhixiong Yang\",\"Yifei Cao\",\"Shihan Dou\",\"Xiaoran Fan\",\"Baoyu Fan\",\"Tao Ji\",\"Tao Gui\",\"Qi Zhang\",\"Xuanjing Huang\"]","published":"2025-09-26T15:12:46Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.CL\",\"eess.AS\"]","methods":"[\"Language Model\"]","has_code":false}
