{"ID":2856126,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.11027","arxiv_id":"2510.11027","title":"Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning","abstract":"While significant research has focused on developing embodied reasoning capabilities using Vision-Language Models (VLMs) or integrating advanced VLMs into Vision-Language-Action (VLA) models for end-to-end robot control, few studies directly address the critical gap between upstream VLM-based reasoning and downstream VLA policy learning. In this work, we take an initial step toward bridging embodied reasoning with VLA policy learning by introducing Vlaser - a Vision-Language-Action Model with synergistic embodied reasoning capability, which is a foundational vision-language model designed to integrate high-level reasoning with low-level control for embodied agents. Built upon the high-quality Vlaser-6M dataset, Vlaser achieves state-of-the-art performance across a range of embodied reasoning benchmarks - including spatial reasoning, embodied grounding, embodied QA, and task planning. Furthermore, we systematically examine how different VLM initializations affect supervised VLA fine-tuning, offering novel insights into mitigating the domain shift between internet-scale pre-training data and embodied-specific policy learning data. Based on these insights, our approach achieves state-of-the-art results on the WidowX benchmark and competitive performance on the Google Robot benchmark.","short_abstract":"While significant research has focused on developing embodied reasoning capabilities using Vision-Language Models (VLMs) or integrating advanced VLMs into Vision-Language-Action (VLA) models for end-to-end robot control, few studies directly address the critical gap between upstream VLM-based reasoning and downstream V...","url_abs":"https://arxiv.org/abs/2510.11027","url_pdf":"https://arxiv.org/pdf/2510.11027v2","authors":"[\"Ganlin Yang\",\"Tianyi Zhang\",\"Haoran Hao\",\"Weiyun Wang\",\"Yibin Liu\",\"Dehui Wang\",\"Guanzhou Chen\",\"Zijian Cai\",\"Junting Chen\",\"Weijie Su\",\"Wengang Zhou\",\"Yu Qiao\",\"Jifeng Dai\",\"Jiangmiao Pang\",\"Gen Luo\",\"Wenhai Wang\",\"Yao Mu\",\"Zhi Hou\"]","published":"2025-10-13T05:51:22Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
