{"ID":2882209,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10423","arxiv_id":"2508.10423","title":"MASH: Cooperative-Heterogeneous Multi-Agent Reinforcement Learning for Single Humanoid Robot Locomotion","abstract":"This paper proposes a novel method to enhance locomotion for a single humanoid robot through cooperative-heterogeneous multi-agent deep reinforcement learning (MARL). While most existing methods typically employ single-agent reinforcement learning algorithms for a single humanoid robot or MARL algorithms for multi-robot system tasks, we propose a distinct paradigm: applying cooperative-heterogeneous MARL to optimize locomotion for a single humanoid robot. The proposed method, multi-agent reinforcement learning for single humanoid locomotion (MASH), treats each limb (legs and arms) as an independent agent that explores the robot's action space while sharing a global critic for cooperative learning. Experiments demonstrate that MASH accelerates training convergence and improves whole-body cooperation ability, outperforming conventional single-agent reinforcement learning methods. This work advances the integration of MARL into single-humanoid-robot control, offering new insights into efficient locomotion strategies.","short_abstract":"This paper proposes a novel method to enhance locomotion for a single humanoid robot through cooperative-heterogeneous multi-agent deep reinforcement learning (MARL). While most existing methods typically employ single-agent reinforcement learning algorithms for a single humanoid robot or MARL algorithms for multi-robo...","url_abs":"https://arxiv.org/abs/2508.10423","url_pdf":"https://arxiv.org/pdf/2508.10423v1","authors":"[\"Qi Liu\",\"Xiaopeng Zhang\",\"Mingshan Tan\",\"Shuaikang Ma\",\"Jinliang Ding\",\"Yanjie Li\"]","published":"2025-08-14T07:54:31Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"eess.SY\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
