{"ID":2878566,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.18337","arxiv_id":"2508.18337","title":"Warm Chat: Diffuse Emotion-aware Interactive Talking Head Avatar with Tree-Structured Guidance","abstract":"Generative models have advanced rapidly, enabling impressive talking head generation that brings AI to life. However, most existing methods focus solely on one-way portrait animation. Even the few that support bidirectional conversational interactions lack precise emotion-adaptive capabilities, significantly limiting their practical applicability. In this paper, we propose Warm Chat, a novel emotion-aware talking head generation framework for dyadic interactions. Leveraging the dialogue generation capability of large language models (LLMs, e.g., GPT-4), our method produces temporally consistent virtual avatars with rich emotional variations that seamlessly transition between speaking and listening states. Specifically, we design a Transformer-based head mask generator that learns temporally consistent motion features in a latent mask space, capable of generating arbitrary-length, temporally consistent mask sequences to constrain head motions. Furthermore, we introduce an interactive talking tree structure to represent dialogue state transitions, where each tree node contains information such as child/parent/sibling nodes and the current character's emotional state. By performing reverse-level traversal, we extract rich historical emotional cues from the current node to guide expression synthesis. Extensive experiments demonstrate the superior performance and effectiveness of our method.","short_abstract":"Generative models have advanced rapidly, enabling impressive talking head generation that brings AI to life. However, most existing methods focus solely on one-way portrait animation. Even the few that support bidirectional conversational interactions lack precise emotion-adaptive capabilities, significantly limiting t...","url_abs":"https://arxiv.org/abs/2508.18337","url_pdf":"https://arxiv.org/pdf/2508.18337v3","authors":"[\"Haijie Yang\",\"Zhenyu Zhang\",\"Hao Tang\",\"Jianjun Qian\",\"Jian Yang\"]","published":"2025-08-25T13:07:03Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.SD\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}
