{"ID":3005126,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T07:50:16.0004273Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03402","arxiv_id":"2606.03402","title":"Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation","abstract":"Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that often struggle to capture subtle motion dynamics, We propose a novel implicit-motion framework for generating realistic and temporally coherent human motion videos from a single static image and audio. Our approach uses a two-stage pipeline that decouples motion prediction from rendering. The first stage integrates appearance priors and hierarchical depth cues into a region-aware attention mechanism to model latent motion features. The second stage employs a Mamba-enhanced diffusion model to directly predict these features from audio and the source image, enabling unsupervised learning of fine-grained motion patterns. This decoupled architecture enhances flexibility and efficiency. Trained on a new 380-hour high-quality dataset, our method outperforms prior work across multiple public benchmarks and our collected data in accuracy, naturalness, and temporal coherence, setting a new state-of-the-art.","short_abstract":"Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that often struggle to capt...","url_abs":"https://arxiv.org/abs/2606.03402","url_pdf":"https://arxiv.org/pdf/2606.03402v1","authors":"[\"Xuan Wei\",\"Jiahui Chen\",\"Kaiheng Li\",\"Mingyu Shao\",\"Qingqi Hong\"]","published":"2026-06-02T09:43:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
