{"ID":2847014,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.00917","arxiv_id":"2511.00917","title":"Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots","abstract":"Today's best-explored routes towards generalist robots center on collecting ever larger \"observations-in actions-out\" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We pursue a road less traveled: building generalist policies directly around VLMs by augmenting their general capabilities with specific robot capabilities encapsulated in a carefully curated set of perception, planning, and control modules. In Maestro, a VLM coding agent dynamically composes these modules into a programmatic policy for the current task and scenario. Maestro's architecture benefits from a streamlined closed-loop interface without many manually imposed structural constraints, and a comprehensive and diverse tool repertoire. As a result, it largely surpasses today's VLA models for zero-shot performance on challenging manipulation skills. Further, Maestro is easily extensible to incorporate new modules, easily editable to suit new embodiments such as a quadruped-mounted arm, and even easily adapts from minimal real-world experiences through local code edits.","short_abstract":"Today's best-explored routes towards generalist robots center on collecting ever larger \"observations-in actions-out\" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We pursue a road less traveled: building generalist policies directly around VLMs...","url_abs":"https://arxiv.org/abs/2511.00917","url_pdf":"https://arxiv.org/pdf/2511.00917v2","authors":"[\"Junyao Shi\",\"Rujia Yang\",\"Kaitian Chao\",\"Selina Bingqing Wan\",\"Yifei Shao\",\"Jiahui Lei\",\"Jianing Qian\",\"Long Le\",\"Pratik Chaudhari\",\"Kostas Daniilidis\",\"Chuan Wen\",\"Dinesh Jayaraman\"]","published":"2025-11-02T12:34:37Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
