{"ID":3084847,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T03:54:17.966829144Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05699","arxiv_id":"2606.05699","title":"DexFuture: Hierarchical Future-State Visuomotor Targeting for Bimanual Dexterous Tool Use","abstract":"Bimanual dexterous tool use remains challenging for robots due to high-dimensional hand configurations and complex hand-tool-object dynamics and contact. Most existing control policies depend on future configuration references provided from demonstrations, while future action-conditioned world models require slow online planning over high-dimensional action sequences. A significant challenge is generating a dynamically consistent future reference trajectory without relying on privileged states from demonstrations or slow counterfactual planning. We propose DexFuture, a hierarchical system that couples a high-level Future-State Visuomotor Target Predictor with a low-level Target-Conditioned Structured Dexterous Policy. Conditioned on egocentric RGB, proprioceptive and geometric history, the high-level predictor constructs structured hand-tool-object visuomotor embeddings and uses a horizon-conditioned transformer to generate a multi-step future target trajectory. Then, the low-level policy tracks them with a target-conditioned per-link transformer. This hierarchy decouples coarse future reference generation from fine-grained action control, and slow long-horizon semantic prediction from high-frequency execution. On OakInk2 bimanual tool-use tasks, DexFuture achieves 90% of the privileged-oracle performance, compared to 7% for a no-reference policy. DexFuture operates at 60 Hz, approximately 250 times faster than DexWM-style Cross-Entropy Method (CEM) planning with a future action-conditioned world model.","short_abstract":"Bimanual dexterous tool use remains challenging for robots due to high-dimensional hand configurations and complex hand-tool-object dynamics and contact. Most existing control policies depend on future configuration references provided from demonstrations, while future action-conditioned world models require slow onlin...","url_abs":"https://arxiv.org/abs/2606.05699","url_pdf":"https://arxiv.org/pdf/2606.05699v1","authors":"[\"Runfa Blark Li\",\"Kuang-Ting Tu\",\"Nikola Raicevic\",\"Dwait Bhatt\",\"Xinshuang Liu\",\"Keito Suzuki\",\"Ki Myung Brian Lee\",\"Nikolay Atanasov\",\"Truong Nguyen\"]","published":"2026-06-04T04:37:23Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Transformer\"]","has_code":false}