{"ID":2889279,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.03722","arxiv_id":"2508.03722","title":"Multimodal Video Emotion Recognition with Reliable Reasoning Priors","abstract":"This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronounced class-imbalance in multimodal emotion recognition, we introduce Balanced Dual-Contrastive Learning, a loss formulation that jointly balances inter-class and intra-class distributions. Applied to the MER2024 benchmark, our prior-enhanced framework yields substantial performance gains, demonstrating that the reliability of MLLM-derived reasoning can be synergistically combined with the domain adaptability of lightweight fusion networks for robust, scalable emotion recognition.","short_abstract":"This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronou...","url_abs":"https://arxiv.org/abs/2508.03722","url_pdf":"https://arxiv.org/pdf/2508.03722v1","authors":"[\"Zhepeng Wang\",\"Yingjian Zhu\",\"Guanghao Dong\",\"Hongzhu Yi\",\"Feng Chen\",\"Xinming Wang\",\"Jun Xie\"]","published":"2025-07-29T15:55:23Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}