{"ID":3050049,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T11:59:53.540122282Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04829","arxiv_id":"2606.04829","title":"M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking","abstract":"Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector trajectory tracking. Existing methods often overlook the representational mismatch between dense robot joint angles and sparse end-effector poses. To address this, we propose Multi-Modal Mimic (M3imic), a versatile multi-modal whole-body control framework that unifies heterogeneous motion reference modalities, including robot joint angles, human pose trajectories, and end-effector poses, using modality-specific encoders to map them into a shared latent space. Leveraging large-scale reinforcement learning in the simulator, we train a single policy that achieves sim-to-real transfer across multiple motion reference modalities without modality-specific retraining. Extensive simulation and real-world experiments on the Unitree G1 robot are conducted to evaluate the proposed framework. In simulation, the policy achieves a peak success rate of 98.42\\% on an unseen test dataset, demonstrating its exceptional generalization capability. The code is available at https://github.com/Renforce-Dynamics/MultiModalWBC","short_abstract":"Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot j...","url_abs":"https://arxiv.org/abs/2606.04829","url_pdf":"https://arxiv.org/pdf/2606.04829v1","authors":"[\"Zuxing Lu\",\"Ziang Zheng\",\"Yao Lyu\",\"Jingyu Liu\",\"Feihong Zhang\",\"Song Lu\",\"Xin Yuan\",\"Changyin Sun\",\"Xingxing Zuo\",\"Shengbo Eben Li\"]","published":"2026-06-03T12:52:37Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":612776,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-04T02:13:16.786527022Z","DeletedAt":null,"paper_id":3050049,"paper_url":"https://arxiv.org/abs/2606.04829","paper_title":"M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking","repo_url":"https://github.com/Renforce-Dynamics/MultiModalWBC","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}