{"ID":2875713,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.01106","arxiv_id":"2509.01106","title":"Robix: A Unified Model for Robot Interaction, Reasoning and Planning","abstract":"We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions, plan long-horizon tasks, and interact naturally with human within an end-to-end framework. Robix further introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix leverages chain-of-thought reasoning and adopts a three-stage training strategy: (1) continued pretraining to enhance foundational embodied reasoning abilities including 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments demonstrate that Robix outperforms both open-source and commercial baselines (e.g., GPT-4o and Gemini 2.5 Pro) in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.","short_abstract":"We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal...","url_abs":"https://arxiv.org/abs/2509.01106","url_pdf":"https://arxiv.org/pdf/2509.01106v2","authors":"[\"Huang Fang\",\"Mengxi Zhang\",\"Heng Dong\",\"Wei Li\",\"Zixuan Wang\",\"Qifeng Zhang\",\"Xueyun Tian\",\"Yucheng Hu\",\"Hang Li\"]","published":"2025-09-01T03:53:47Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CV\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}