{"ID":2851648,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19430","arxiv_id":"2510.19430","title":"GigaBrain-0: A World Model-Powered Vision-Language-Action Model","abstract":"Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.","short_abstract":"Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this cha...","url_abs":"https://arxiv.org/abs/2510.19430","url_pdf":"https://arxiv.org/pdf/2510.19430v3","authors":"[\"GigaBrain Team\",\"Angen Ye\",\"Boyuan Wang\",\"Chaojun Ni\",\"Guan Huang\",\"Guosheng Zhao\",\"Haoyun Li\",\"Jie Li\",\"Jiagang Zhu\",\"Lv Feng\",\"Peng Li\",\"Qiuping Deng\",\"Runqi Ouyang\",\"Wenkang Qin\",\"Xinze Chen\",\"Xiaofeng Wang\",\"Yang Wang\",\"Yifan Li\",\"Yilong Li\",\"Yiran Ding\",\"Yuan Xu\",\"Yun Ye\",\"Yukun Zhou\",\"Zhehao Dong\",\"Zhenan Wang\",\"Zhichao Liu\",\"Zheng Zhu\"]","published":"2025-10-22T09:57:13Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\"]","methods":"[]","has_code":false}
