{"ID":2829130,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13674","arxiv_id":"2512.13674","title":"Towards Interactive Intelligence for Digital Humans","abstract":"We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face Animator, Body Animator, and Renderer. This unified architecture integrates cognitive reasoning with real-time multimodal embodiment to enable fluid, consistent interaction. Furthermore, we establish a new benchmark to rigorously evaluate the capabilities of interactive intelligence. Extensive experiments demonstrate that our framework achieves superior performance compared to state-of-the-art methods across all evaluated dimensions. Together, these contributions move digital humans beyond superficial imitation toward intelligent interaction.","short_abstract":"We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face...","url_abs":"https://arxiv.org/abs/2512.13674","url_pdf":"https://arxiv.org/pdf/2512.13674v2","authors":"[\"Yiyi Cai\",\"Xuangeng Chu\",\"Xiwei Gao\",\"Sitong Gong\",\"Yifei Huang\",\"Caixin Kang\",\"Kunhang Li\",\"Haiyang Liu\",\"Ruicong Liu\",\"Yun Liu\",\"Dianwen Ng\",\"Zixiong Su\",\"Erwin Wu\",\"Yuhan Wu\",\"Dingkun Yan\",\"Tianyu Yan\",\"Chang Zeng\",\"Bo Zheng\",\"You Zhou\"]","published":"2025-12-15T18:57:35Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\",\"cs.GR\",\"cs.HC\"]","methods":"[]","has_code":false}