{"ID":2863697,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24917","arxiv_id":"2509.24917","title":"From Code to Action: Hierarchical Learning of Diffusion-VLM Policies","abstract":"Imitation learning for robotic manipulation often suffers from limited generalization and data scarcity, especially in complex, long-horizon tasks. In this work, we introduce a hierarchical framework that leverages code-generating vision-language models (VLMs) in combination with low-level diffusion policies to effectively imitate and generalize robotic behavior. Our key insight is to treat open-source robotic APIs not only as execution interfaces but also as sources of structured supervision: the associated subtask functions - when exposed - can serve as modular, semantically meaningful labels. We train a VLM to decompose task descriptions into executable subroutines, which are then grounded through a diffusion policy trained to imitate the corresponding robot behavior. To handle the non-Markovian nature of both code execution and certain real-world tasks, such as object swapping, our architecture incorporates a memory mechanism that maintains subtask context across time. We find that this design enables interpretable policy decomposition, improves generalization when compared to flat policies and enables separate evaluation of high-level planning and low-level control.","short_abstract":"Imitation learning for robotic manipulation often suffers from limited generalization and data scarcity, especially in complex, long-horizon tasks. In this work, we introduce a hierarchical framework that leverages code-generating vision-language models (VLMs) in combination with low-level diffusion policies to effecti...","url_abs":"https://arxiv.org/abs/2509.24917","url_pdf":"https://arxiv.org/pdf/2509.24917v1","authors":"[\"Markus Peschl\",\"Pietro Mazzaglia\",\"Daniel Dijkman\"]","published":"2025-09-29T15:22:18Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}
