{"ID":2858610,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06710","arxiv_id":"2510.06710","title":"RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models","abstract":"Recent advances in vision-language-action (VLA) models have motivated the extension of their capabilities to embodied settings, where reinforcement learning (RL) offers a principled way to optimize task success through interaction. However, existing methods remain fragmented, lacking both a unified platform for fair comparison across architectures and algorithms and an efficient system design for scalable training. To address these challenges, we introduce RLinf-VLA, a unified and efficient framework for scalable RL training of VLA models. RLinf-VLA achieves unification by providing a unified interface that standardizes the integration of diverse VLA architectures, multiple RL algorithms, and heterogeneous simulators, enabling extensibility. To ensure efficiency, the system adopts a flexible resource allocation architecture for rendering, inference, and training workloads in RL pipelines. In particular, for GPU-parallelized simulators, RLinf-VLA introduces a hybrid fine-grained pipeline allocation strategy, yielding a 1.61x-1.88x training speedup. Using this unified system, models trained with RLinf-VLA demonstrate consistent performance improvements of approximately 20-85% across multiple simulation benchmarks, including LIBERO, ManiSkill, and RoboTwin. Furthermore, we distill a set of training practices for effective RL-based VLA training. We position RLinf-VLA as a foundational system to enable efficient, unified, and reproducible research in embodied intelligence.","short_abstract":"Recent advances in vision-language-action (VLA) models have motivated the extension of their capabilities to embodied settings, where reinforcement learning (RL) offers a principled way to optimize task success through interaction. However, existing methods remain fragmented, lacking both a unified platform for fair co...","url_abs":"https://arxiv.org/abs/2510.06710","url_pdf":"https://arxiv.org/pdf/2510.06710v2","authors":"[\"Hongzhi Zang\",\"Mingjie Wei\",\"Si Xu\",\"Yongji Wu\",\"Zhen Guo\",\"Yuanqing Wang\",\"Hao Lin\",\"Peihong Wang\",\"Liangzhi Shi\",\"Yuqing Xie\",\"Zhexuan Xu\",\"Zhihao Liu\",\"Kang Chen\",\"Wenhao Tang\",\"Quanlu Zhang\",\"Weinan Zhang\",\"Chao Yu\",\"Yu Wang\"]","published":"2025-10-08T07:05:13Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
