{"ID":2893351,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.12804","arxiv_id":"2507.12804","title":"ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion","abstract":"Audio-driven talking head generation requires precise synchronization between facial animations and audio signals. This paper introduces ATL-Diff, a novel approach addressing synchronization limitations while reducing noise and computational costs. Our framework features three key components: a Landmark Generation Module converting audio to facial landmarks, a Landmarks-Guide Noise approach that decouples audio by distributing noise according to landmarks, and a 3D Identity Diffusion network preserving identity characteristics. Experiments on MEAD and CREMA-D datasets demonstrate that ATL-Diff outperforms state-of-the-art methods across all metrics. Our approach achieves near real-time processing with high-quality animations, computational efficiency, and exceptional preservation of facial nuances. This advancement offers promising applications for virtual assistants, education, medical communication, and digital platforms. The source code is available at: \\href{https://github.com/sonvth/ATL-Diff}{https://github.com/sonvth/ATL-Diff}","short_abstract":"Audio-driven talking head generation requires precise synchronization between facial animations and audio signals. This paper introduces ATL-Diff, a novel approach addressing synchronization limitations while reducing noise and computational costs. Our framework features three key components: a Landmark Generation Modu...","url_abs":"https://arxiv.org/abs/2507.12804","url_pdf":"https://arxiv.org/pdf/2507.12804v1","authors":"[\"Hoang-Son Vo\",\"Quang-Vinh Nguyen\",\"Seungwon Kim\",\"Hyung-Jeong Yang\",\"Soonja Yeom\",\"Soo-Hyung Kim\"]","published":"2025-07-17T05:40:51Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":612049,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2893351,"paper_url":"https://arxiv.org/abs/2507.12804","paper_title":"ATL-Diff: Audio-Driven Talking Head Generation with Early Landmarks-Guide Noise Diffusion","repo_url":"https://github.com/sonvth/ATL-Diff","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}