{"ID":2867857,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18046","arxiv_id":"2509.18046","title":"HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba","abstract":"End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.","short_abstract":"End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer...","url_abs":"https://arxiv.org/abs/2509.18046","url_pdf":"https://arxiv.org/pdf/2509.18046v2","authors":"[\"Yinuo Wang\",\"Yuanyang Qi\",\"Jinzhao Zhou\",\"Pengxiang Meng\",\"Xiaowen Tao\"]","published":"2025-09-22T17:19:55Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.ET\",\"eess.SP\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}