{"ID":2841532,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11000","arxiv_id":"2511.11000","title":"DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition","abstract":"Recognizing speaker intent in long audio dialogues among speakers has a wide range of applications, but is a non-trivial AI task due to complex inter-dependencies in speaker utterances and scarce annotated data. To address these challenges, an end-to-end framework, namely DialogGraph-LLM, is proposed in the current work. DialogGraph-LLM combines a novel Multi-Relational Dialogue Attention Network (MR-DAN) architecture with multimodal foundation models (e.g., Qwen2.5-Omni-7B) for direct acoustic-to-intent inference. An adaptive semi-supervised learning strategy is designed using LLM with a confidence-aware pseudo-label generation mechanism based on dual-threshold filtering using both global and class confidences, and an entropy-based sample selection process that prioritizes high-information unlabeled instances. Extensive evaluations on the proprietary MarketCalls corpus and the publicly available MIntRec 2.0 benchmark demonstrate DialogGraph-LLM's superiority over strong audio and text-driven baselines. The framework demonstrates strong performance and efficiency in intent recognition in real world scenario audio dialogues, proving its practical value for audio-rich domains with limited supervision. Our code is available at https://github.com/david188888/DialogGraph-LLM.","short_abstract":"Recognizing speaker intent in long audio dialogues among speakers has a wide range of applications, but is a non-trivial AI task due to complex inter-dependencies in speaker utterances and scarce annotated data. To address these challenges, an end-to-end framework, namely DialogGraph-LLM, is proposed in the current wor...","url_abs":"https://arxiv.org/abs/2511.11000","url_pdf":"https://arxiv.org/pdf/2511.11000v2","authors":"[\"HongYu Liu\",\"Junxin Li\",\"Changxi Guo\",\"Hao Chen\",\"Yaqian Huang\",\"Yifu Guo\",\"Huan Yang\",\"Lihua Cai\"]","published":"2025-11-14T06:42:04Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":607064,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2841532,"paper_url":"https://arxiv.org/abs/2511.11000","paper_title":"DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition","repo_url":"https://github.com/david188888/DialogGraph-LLM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}