{"ID":2849782,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.24795","arxiv_id":"2510.24795","title":"A Survey on Efficient Vision-Language-Action Models","abstract":"Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. Despite their remarkable performance, foundational VLAs are hindered by the prohibitive computational and data demands inherent to their large-scale architectures. While a surge of recent research has focused on enhancing VLA efficiency, the field lacks a unified framework to consolidate these disparate advancements. To bridge this gap, this survey presents the first comprehensive review of Efficient Vision-Language-Action models (Efficient VLAs) across the entire model-training-data pipeline. Specifically, we introduce a unified taxonomy to systematically organize the disparate efforts in this domain, categorizing current techniques into three core pillars: (1) Efficient Model Design, focusing on efficient architectures and model compression; (2) Efficient Training, which reduces computational burdens during model learning; and (3) Efficient Data Collection, which addresses the bottlenecks in acquiring and utilizing robotic data. Through a critical review of state-of-the-art methods within this framework, this survey not only establishes a foundational reference for the community but also summarizes representative applications, delineates key challenges, and charts a roadmap for future research. We maintain a continuously updated project page to track our latest developments: https://evla-survey.github.io/.","short_abstract":"Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. Despite their remarkable performance, foundational VLAs are hindered by the prohibitive computational and data demands inherent to their large-scale archite...","url_abs":"https://arxiv.org/abs/2510.24795","url_pdf":"https://arxiv.org/pdf/2510.24795v2","authors":"[\"Zhaoshu Yu\",\"Bo Wang\",\"Pengpeng Zeng\",\"Haonan Zhang\",\"Ji Zhang\",\"Zheng Wang\",\"Lianli Gao\",\"Jingkuan Song\",\"Nicu Sebe\",\"Heng Tao Shen\"]","published":"2025-10-27T17:57:33Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\",\"cs.RO\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
