{"ID":2838169,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17889","arxiv_id":"2511.17889","title":"MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots","abstract":"Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address these issues, we present MobileVLA-R1, a unified vision-language-action framework that enables explicit reasoning and continuous control for quadruped robots. We construct MobileVLA-CoT, a large-scale dataset of multi-granularity chain-of-thought (CoT) for embodied trajectories, providing structured reasoning supervision for alignment. Built upon this foundation, we introduce a two-stage training paradigm that combines supervised CoT alignment with GRPO reinforcement learning to enhance reasoning consistency, control stability, and long-horizon execution. Extensive evaluations on VLN and VLA tasks demonstrate superior performance over strong baselines, with approximately a 5% improvement. Real-world deployment on a quadruped robot validates robust performance in complex environments. Code: https://github.com/AIGeeksGroup/MobileVLA-R1. Website: https://aigeeksgroup.github.io/MobileVLA-R1.","short_abstract":"Grounding natural-language instructions into continuous control for quadruped robots remains a fundamental challenge in vision language action. Existing methods struggle to bridge high-level semantic reasoning and low-level actuation, leading to unstable grounding and weak generalization in the real world. To address t...","url_abs":"https://arxiv.org/abs/2511.17889","url_pdf":"https://arxiv.org/pdf/2511.17889v1","authors":"[\"Ting Huang\",\"Dongjian Li\",\"Rui Yang\",\"Zeyu Zhang\",\"Zida Yang\",\"Hao Tang\"]","published":"2025-11-22T02:34:10Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":606741,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2838169,"paper_url":"https://arxiv.org/abs/2511.17889","paper_title":"MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots","repo_url":"https://github.com/AIGeeksGroup/MobileVLA-R1","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
