{"ID":2863494,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24579","arxiv_id":"2509.24579","title":"U-DiT Policy: U-shaped Diffusion Transformers for Robotic Manipulation","abstract":"Diffusion-based methods have been acknowledged as a powerful paradigm for end-to-end visuomotor control in robotics. Most existing approaches adopt a Diffusion Policy in U-Net architecture (DP-U), which, while effective, suffers from limited global context modeling and over-smoothing artifacts. To address these issues, we propose U-DiT Policy, a novel U-shaped Diffusion Transformer framework. U-DiT preserves the multi-scale feature fusion advantages of U-Net while integrating the global context modeling capability of Transformers, thereby enhancing representational power and policy expressiveness. We evaluate U-DiT extensively across both simulation and real-world robotic manipulation tasks. In simulation, U-DiT achieves an average performance gain of 10\\% over baseline methods and surpasses Transformer-based diffusion policies (DP-T) that use AdaLN blocks by 6\\% under comparable parameter budgets. On real-world robotic tasks, U-DiT demonstrates superior generalization and robustness, achieving an average improvement of 22.5\\% over DP-U. In addition, robustness and generalization experiments under distractor and lighting variations further highlight the advantages of U-DiT. These results highlight the effectiveness and practical potential of U-DiT Policy as a new foundation for diffusion-based robotic manipulation.","short_abstract":"Diffusion-based methods have been acknowledged as a powerful paradigm for end-to-end visuomotor control in robotics. Most existing approaches adopt a Diffusion Policy in U-Net architecture (DP-U), which, while effective, suffers from limited global context modeling and over-smoothing artifacts. To address these issues,...","url_abs":"https://arxiv.org/abs/2509.24579","url_pdf":"https://arxiv.org/pdf/2509.24579v1","authors":"[\"Linzhi Wu\",\"Aoran Mei\",\"Xiyue Wang\",\"Guo-Niu Zhu\",\"Zhongxue Gan\"]","published":"2025-09-29T10:38:53Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
