{"ID":2869134,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14592","arxiv_id":"2509.14592","title":"MMED: A Multimodal Micro-Expression Dataset based on Audio-Visual Fusion","abstract":"Micro-expressions (MEs) are crucial leakages of concealed emotion, yet their study has been constrained by a reliance on silent, visual-only data. To solve this issue, we introduce two principal contributions. First, MMED, to our knowledge, is the first dataset capturing the spontaneous vocal cues that co-occur with MEs in ecologically valid, high-stakes interactions. Second, the Asymmetric Multimodal Fusion Network (AMF-Net) is a novel method that effectively fuses a global visual summary with a dynamic audio sequence via an asymmetric cross-attention framework. Rigorous Leave-One-Subject-Out Cross-Validation (LOSO-CV) experiments validate our approach, providing conclusive evidence that audio offers critical, disambiguating information for ME analysis. Collectively, the MMED dataset and our AMF-Net method provide valuable resources and a validated analytical approach for micro-expression recognition.","short_abstract":"Micro-expressions (MEs) are crucial leakages of concealed emotion, yet their study has been constrained by a reliance on silent, visual-only data. To solve this issue, we introduce two principal contributions. First, MMED, to our knowledge, is the first dataset capturing the spontaneous vocal cues that co-occur with ME...","url_abs":"https://arxiv.org/abs/2509.14592","url_pdf":"https://arxiv.org/pdf/2509.14592v1","authors":"[\"Junbo Wang\",\"Yan Zhao\",\"Shuo Li\",\"Shibo Wang\",\"Shigang Wang\",\"Jian Wei\"]","published":"2025-09-18T03:52:38Z","proceeding":"cs.MM","tasks":"[\"cs.MM\",\"cs.SD\"]","methods":"[]","has_code":false}