{"ID":2835653,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.23475","arxiv_id":"2511.23475","title":"AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement","abstract":"Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challenges due to the high costs of diverse multi-person data collection and the difficulty of driving multiple identities with coherent interactivity. To address these challenges, we propose AnyTalker, a multi-person generation framework that features an extensible multi-stream processing architecture. Specifically, we extend Diffusion Transformer's attention block with a novel identity-aware attention mechanism that iteratively processes identity-audio pairs, allowing arbitrary scaling of drivable identities. Besides, training multi-person generative models demands massive multi-person data. Our proposed training pipeline depends solely on single-person videos to learn multi-person speaking patterns and refines interactivity with only a few real multi-person clips. Furthermore, we contribute a targeted metric and dataset designed to evaluate the naturalness and interactivity of the generated multi-person videos. Extensive experiments demonstrate that AnyTalker achieves remarkable lip synchronization, visual quality, and natural interactivity, striking a favorable balance between data costs and identity scalability.","short_abstract":"Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challenges due to the high costs of diverse multi-person data collection and the difficulty of driving multiple identities with coherent...","url_abs":"https://arxiv.org/abs/2511.23475","url_pdf":"https://arxiv.org/pdf/2511.23475v1","authors":"[\"Zhizhou Zhong\",\"Yicheng Ji\",\"Zhe Kong\",\"Yiying Liu\",\"Jiarui Wang\",\"Jiasun Feng\",\"Lupeng Liu\",\"Xiangyi Wang\",\"Yanjia Li\",\"Yuqing She\",\"Ying Qin\",\"Huan Li\",\"Shuiyang Mao\",\"Wei Liu\",\"Wenhan Luo\"]","published":"2025-11-28T18:59:01Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
