{"ID":2866711,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.20322","arxiv_id":"2509.20322","title":"VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation","abstract":"Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .","short_abstract":"Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifi...","url_abs":"https://arxiv.org/abs/2509.20322","url_pdf":"https://arxiv.org/pdf/2509.20322v2","authors":"[\"Shaofeng Yin\",\"Yanjie Ze\",\"Hong-Xing Yu\",\"C. Karen Liu\",\"Jiajun Wu\"]","published":"2025-09-24T17:10:02Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false}
