{"ID":2874204,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.04996","arxiv_id":"2509.04996","title":"FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies","abstract":"Developing efficient Vision-Language-Action (VLA) policies is crucial for practical robotics deployment, yet current approaches face prohibitive computational costs and resource requirements. Existing diffusion-based VLA policies require multi-billion-parameter models and massive datasets to achieve strong performance. We tackle this efficiency challenge with two contributions: intermediate-modality fusion, which reallocates capacity to the diffusion head by pruning up to $50\\%$ of LLM layers, and action-specific Global-AdaLN conditioning, which cuts parameters by $20\\%$ through modular adaptation. We integrate these advances into a novel 950 M-parameter VLA called FLOWER. Pretrained in just 200 H100 GPU hours, FLOWER delivers competitive performance with bigger VLAs across $190$ tasks spanning ten simulation and real-world benchmarks and demonstrates robustness across diverse robotic embodiments. In addition, FLOWER achieves a new SoTA of 4.53 on the CALVIN ABC benchmark. Demos, code and pretrained weights are available at https://intuitive-robots.github.io/flower_vla/.","short_abstract":"Developing efficient Vision-Language-Action (VLA) policies is crucial for practical robotics deployment, yet current approaches face prohibitive computational costs and resource requirements. Existing diffusion-based VLA policies require multi-billion-parameter models and massive datasets to achieve strong performance....","url_abs":"https://arxiv.org/abs/2509.04996","url_pdf":"https://arxiv.org/pdf/2509.04996v1","authors":"[\"Moritz Reuss\",\"Hongyi Zhou\",\"Marcel Rühle\",\"Ömer Erdinç Yağmurlu\",\"Fabian Otto\",\"Rudolf Lioutikov\"]","published":"2025-09-05T10:43:12Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Diffusion Model\",\"Large Language Model\"]","has_code":false}
