{"ID":2881097,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.12854","arxiv_id":"2508.12854","title":"E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model","abstract":"Multimodal Empathetic Response Generation (MERG) is crucial for building emotionally intelligent human-computer interactions. Although large language models (LLMs) have improved text-based ERG, challenges remain in handling multimodal emotional content and maintaining identity consistency. Thus, we propose E3RG, an Explicit Emotion-driven Empathetic Response Generation System based on multimodal LLMs which decomposes MERG task into three parts: multimodal empathy understanding, empathy memory retrieval, and multimodal response generation. By integrating advanced expressive speech and video generative models, E3RG delivers natural, emotionally rich, and identity-consistent responses without extra training. Experiments validate the superiority of our system on both zero-shot and few-shot settings, securing Top-1 position in the Avatar-based Multimodal Empathy Challenge on ACM MM 25. Our code is available at https://github.com/RH-Lin/E3RG.","short_abstract":"Multimodal Empathetic Response Generation (MERG) is crucial for building emotionally intelligent human-computer interactions. Although large language models (LLMs) have improved text-based ERG, challenges remain in handling multimodal emotional content and maintaining identity consistency. Thus, we propose E3RG, an Exp...","url_abs":"https://arxiv.org/abs/2508.12854","url_pdf":"https://arxiv.org/pdf/2508.12854v1","authors":"[\"Ronghao Lin\",\"Shuai Shen\",\"Weipeng Hu\",\"Qiaolin He\",\"Aolin Xiong\",\"Li Huang\",\"Haifeng Hu\",\"Yap-peng Tan\"]","published":"2025-08-18T11:47:02Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.CV\",\"cs.HC\",\"cs.MM\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610761,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2881097,"paper_url":"https://arxiv.org/abs/2508.12854","paper_title":"E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model","repo_url":"https://github.com/RH-Lin/E3RG","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
