{"ID":2865361,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22363","arxiv_id":"2509.22363","title":"Investigating Faithfulness in Large Audio Language Models","abstract":"Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framework to evaluate CoT faithfulness in LALMs with respect to both the input audio and the final model prediction. We define three criteria for audio faithfulness: hallucination-free, holistic, and attentive listening. We also introduce a benchmark based on both audio and CoT interventions to assess faithfulness. Experiments on Audio Flamingo 3 and Qwen2.5-Omni suggest a potential multimodal disconnect: reasoning often aligns with the final prediction but is not always strongly grounded in the audio and can be vulnerable to hallucinations or adversarial perturbations.","short_abstract":"Large Audio Language Models (LALMs) integrate audio encoders with pretrained Large Language Models to perform complex multimodal reasoning tasks. While these models can generate Chain-of-Thought (CoT) explanations, the faithfulness of these reasoning chains remains unclear. In this work, we propose a systematic framewo...","url_abs":"https://arxiv.org/abs/2509.22363","url_pdf":"https://arxiv.org/pdf/2509.22363v3","authors":"[\"Pooneh Mousavi\",\"Lovenya Jain\",\"Mirco Ravanelli\",\"Cem Subakan\"]","published":"2025-09-26T13:58:22Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"eess.AS\"]","methods":"[\"Language Model\"]","has_code":false}
