{"ID":2824061,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.24321","arxiv_id":"2512.24321","title":"UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots","abstract":"A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility. Despite advances in humanoid control, bridging high-level multimodal perception with whole-body execution remains a significant bottleneck. Existing methods often struggle to translate heterogeneous instructions -- such as language, music, and trajectories -- into stable, real-time actions. Here we show that UniAct, a two-stage framework integrating a fine-tuned MLLM with a causal streaming pipeline, enables humanoid robots to execute multimodal instructions with sub-500 ms latency. By unifying inputs through a shared discrete codebook via FSQ, UniAct ensures cross-modal alignment while constraining motions to a physically grounded manifold. This approach yields a 19% improvement in the success rate of zero-shot tracking of imperfect reference motions. We validate UniAct on UniMoCap, our 20-hour humanoid motion benchmark, demonstrating robust generalization across diverse real-world scenarios. Our results mark a critical step toward responsive, general-purpose humanoid assistants capable of seamless interaction through unified perception and control.","short_abstract":"A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility. Despite advances in humanoid control, bridging high-level multimodal perception with whole-body execution remains a significant bottleneck. Existing me...","url_abs":"https://arxiv.org/abs/2512.24321","url_pdf":"https://arxiv.org/pdf/2512.24321v1","authors":"[\"Nan Jiang\",\"Zimo He\",\"Wanhe Yu\",\"Lexi Pang\",\"Yunhao Li\",\"Hongjie Li\",\"Jieming Cui\",\"Yuhan Li\",\"Yizhou Wang\",\"Yixin Zhu\",\"Siyuan Huang\"]","published":"2025-12-30T16:20:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[\"Large Language Model\"]","has_code":false}
