{"ID":2859481,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06145","arxiv_id":"2510.06145","title":"Bimanual 3D Hand Motion and Articulation Forecasting in Everyday Images","abstract":"We tackle the problem of forecasting bimanual 3D hand motion \u0026 articulation from a single image in everyday settings. To address the lack of 3D hand annotations in diverse settings, we design an annotation pipeline consisting of a diffusion model to lift 2D hand keypoint sequences to 4D hand motion. For the forecasting model, we adopt a diffusion loss to account for the multimodality in hand motion distribution. Extensive experiments across 6 datasets show the benefits of training on diverse data with imputed labels (14% improvement) and effectiveness of our lifting (42% better) \u0026 forecasting (16.4% gain) models, over the best baselines, especially in zero-shot generalization to everyday images.","short_abstract":"We tackle the problem of forecasting bimanual 3D hand motion \u0026 articulation from a single image in everyday settings. To address the lack of 3D hand annotations in diverse settings, we design an annotation pipeline consisting of a diffusion model to lift 2D hand keypoint sequences to 4D hand motion. For the forecasting...","url_abs":"https://arxiv.org/abs/2510.06145","url_pdf":"https://arxiv.org/pdf/2510.06145v1","authors":"[\"Aditya Prakash\",\"David Forsyth\",\"Saurabh Gupta\"]","published":"2025-10-07T17:18:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}