{"ID":2867680,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17711","arxiv_id":"2509.17711","title":"DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation","abstract":"Human engagement estimation in conversational scenarios is essential for applications such as adaptive tutoring, remote healthcare assessment, and socially aware human--computer interaction. Engagement is a dynamic, multimodal signal conveyed by facial expressions, speech, gestures, and behavioral cues over time. In this work we introduce DA-Mamba, a dialogue-aware multimodal architecture that replaces attention-heavy dialogue encoders with Mamba-based selective state-space processing to achieve linear time and memory complexity while retaining expressive cross-modal reasoning. We design a Mamba dialogue-aware selective state-space model composed of three core modules: a Dialogue-Aware Encoder, and two Mamba-based fusion mechanisms: Modality-Group Fusion and Partner-Group Fusion, these modules achieve expressive dialogue understanding. Extensive experiments on three standard benchmarks (NoXi, NoXi-Add, and MPIIGI) show that DA-Mamba surpasses prior state-of-the-art (SOTA) methods in concordance correlation coefficient (CCC), while reducing training time and peak memory; these gains enable processing much longer sequences and facilitate real-time deployment in resource-constrained, multi-party conversational settings. The source code will be available at: https://github.com/kksssssss-ssda/MMEA.","short_abstract":"Human engagement estimation in conversational scenarios is essential for applications such as adaptive tutoring, remote healthcare assessment, and socially aware human--computer interaction. Engagement is a dynamic, multimodal signal conveyed by facial expressions, speech, gestures, and behavioral cues over time. In th...","url_abs":"https://arxiv.org/abs/2509.17711","url_pdf":"https://arxiv.org/pdf/2509.17711v1","authors":"[\"Shenwei Kang\",\"Xin Zhang\",\"Wen Liu\",\"Bin Li\",\"Yujie Liu\",\"Bo Gao\"]","published":"2025-09-22T12:48:42Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":609506,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867680,"paper_url":"https://arxiv.org/abs/2509.17711","paper_title":"DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation","repo_url":"https://github.com/kksssssss-ssda/MMEA","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
