{"ID":2871274,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12275","arxiv_id":"2509.12275","title":"Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering","abstract":"With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. While current methods mainly rely on constructing new datasets via captioning or reasoning traces, existing high-quality AQA data remains underutilized. To address this, we propose Omni-CLST, an error-aware Curriculum Learning framework with guided Selective Chain-of-Thought. The framework efficiently leverages existing high-quality dataset through two key strategies: an error-aware curriculum that organizes samples by difficulty, and a guided thought dropout mechanism that focuses reasoning on challenging cases. Experiments show that Omni-CLST achieves 73.80% on MMAU-mini and a new state of the art of 64.30% on MMAR, demonstrating robust generalization in multimodal audio-language understanding.","short_abstract":"With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. While current methods mainly rely on constructing new datasets via captioning or reasoning traces, existing high-quality...","url_abs":"https://arxiv.org/abs/2509.12275","url_pdf":"https://arxiv.org/pdf/2509.12275v3","authors":"[\"Jinghua Zhao\",\"Hang Su\",\"Lichun Fan\",\"Zhenbo Luo\",\"Hui Wang\",\"Haoqin Sun\",\"Yong Qin\"]","published":"2025-09-14T06:54:12Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"eess.AS\"]","methods":"[\"Language Model\",\"Generative Adversarial Network\"]","has_code":false}