{"ID":2921590,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-07T06:21:31.910515433Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01027","arxiv_id":"2606.01027","title":"$τ_0$-WM: A Unified Video-Action World Model for Robotic Manipulation","abstract":"Robotic manipulation requires models that generate executable actions while anticipating and evaluating their future consequences before physical execution. We present $τ_0$-World Model ($τ_0$-WM), a unified video-action world model that integrates policy learning, video prediction, and action evaluation within a single future-predictive framework. Built on a shared video diffusion backbone, $τ_0$-WM provides two complementary interfaces. First, a video action model jointly predicts future visual latents and continuous action chunks from multi-view observations, language instructions, and robot state. Second, an action-conditioned video simulator rolls out candidate action chunks into multi-view futures and predicts dense task-progress scores. The model is trained on approximately $27{,}300$ hours of real-robot teleoperation, UMI-style interaction, egocentric human videos, and rollout or failure trajectories using modality-specific supervision masks. At inference time, $τ_0$-WM uses test-time computation to sample action candidates, rank them with re-denoising consistency, and invoke simulator-based rectification for low-quality candidates. On challenging long-horizon and fine-grained robotic manipulation tasks, $τ_0$-WM shows superior performance over other relevant baselines.","short_abstract":"Robotic manipulation requires models that generate executable actions while anticipating and evaluating their future consequences before physical execution. We present $τ_0$-World Model ($τ_0$-WM), a unified video-action world model that integrates policy learning, video prediction, and action evaluation within a singl...","url_abs":"https://arxiv.org/abs/2606.01027","url_pdf":"https://arxiv.org/pdf/2606.01027v1","authors":"[\"Pengfei Zhou\",\"Shengcong Chen\",\"Di Chen\",\"Jiaxu Wang\",\"Rongjun Jin\",\"Bingwen Zhu\",\"Yike Pan\",\"Songen Gu\",\"Kuanning Wang\",\"Shufeng Nan\",\"Xingyu Qiu\",\"Chenhao Qiu\",\"Pu Yang\",\"Yunuo Cai\",\"Jianxiong Gao\",\"Yifan Li\",\"Yanwei Fu\",\"Xiangyu Yue\",\"Zhi Chen\",\"Jianlan Luo\"]","published":"2026-05-31T05:35:36Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Diffusion Model\"]","has_code":false}
