{"ID":2826344,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19546","arxiv_id":"2512.19546","title":"ActAvatar: Temporally-Aware Precise Action Control for Talking Avatars","abstract":"Despite significant advances in talking avatar generation, existing methods face critical challenges: insufficient text-following capability for diverse actions, lack of temporal alignment between actions and audio content, and dependency on additional control signals such as pose skeletons. We present ActAvatar, a framework that achieves phase-level precision in action control through textual guidance by capturing both action semantics and temporal context. Our approach introduces three core innovations: (1) Phase-Aware Cross-Attention (PACA), which decomposes prompts into a global base block and temporally-anchored phase blocks, enabling the model to concentrate on phase-relevant tokens for precise temporal-semantic alignment; (2) Progressive Audio-Visual Alignment, which aligns modality influence with the hierarchical feature learning process-early layers prioritize text for establishing action structure while deeper layers emphasize audio for refining lip movements, preventing modality interference; (3) A two-stage training strategy that first establishes robust audio-visual correspondence on diverse data, then injects action control through fine-tuning on structured annotations, maintaining both audio-visual alignment and the model's text-following capabilities. Extensive experiments demonstrate that ActAvatar significantly outperforms state-of-the-art methods in both action control and visual quality.","short_abstract":"Despite significant advances in talking avatar generation, existing methods face critical challenges: insufficient text-following capability for diverse actions, lack of temporal alignment between actions and audio content, and dependency on additional control signals such as pose skeletons. We present ActAvatar, a fra...","url_abs":"https://arxiv.org/abs/2512.19546","url_pdf":"https://arxiv.org/pdf/2512.19546v2","authors":"[\"Ziqiao Peng\",\"Yi Chen\",\"Yifeng Ma\",\"Guozhen Zhang\",\"Zhiyao Sun\",\"Zixiang Zhou\",\"Youliang Zhang\",\"Zhengguang Zhou\",\"Zhaoxin Fan\",\"Hongyan Liu\",\"Yuan Zhou\",\"Qinglin Lu\",\"Jun He\"]","published":"2025-12-22T16:28:27Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
