{"ID":2866951,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18676","arxiv_id":"2509.18676","title":"3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space","abstract":"Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.","short_abstract":"Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook local...","url_abs":"https://arxiv.org/abs/2509.18676","url_pdf":"https://arxiv.org/pdf/2509.18676v1","authors":"[\"Sangjun Noh\",\"Dongwoo Nam\",\"Kangmin Kim\",\"Geonhyup Lee\",\"Yeonguk Yu\",\"Raeyoung Kang\",\"Kyoobin Lee\"]","published":"2025-09-23T05:48:01Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"eess.SY\"]","methods":"[\"Diffusion Model\"]","project_urls":"[\"https://sites.google.com/view/3dfdp/home\"]","has_code":false,"code_links":[{"ID":609423,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2866951,"paper_url":"https://arxiv.org/abs/2509.18676","paper_title":"3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space","repo_url":"https://github.com/google/safevalues","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}