{"ID":2881242,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13104","arxiv_id":"2508.13104","title":"Precise Action-to-Video Generation Through Visual Action Prompts","abstract":"We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual dynamics across domains. Action-driven video generation faces a precision-generality trade-off: existing methods using text, primitive actions, or coarse masks offer generality but lack precision, while agent-centric action signals provide precision at the cost of cross-domain transferability. To balance action precision and dynamic transferability, we propose to \"render\" actions into precise visual prompts as domain-agnostic representations that preserve both geometric precision and cross-domain adaptability for complex actions; specifically, we choose visual skeletons for their generality and accessibility. We propose robust pipelines to construct skeletons from two interaction-rich data sources - human-object interactions (HOI) and dexterous robotic manipulation - enabling cross-domain training of action-driven generative models. By integrating visual skeletons into pretrained video generation models via lightweight fine-tuning, we enable precise action control of complex interaction while preserving the learning of cross-domain dynamics. Experiments on EgoVid, RT-1 and DROID demonstrate the effectiveness of our proposed approach. Project page: https://zju3dv.github.io/VAP/.","short_abstract":"We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual dynamics across domains. Action-driven video generation faces a precision-generality trade-off: existing methods using text, primitive actions, or coars...","url_abs":"https://arxiv.org/abs/2508.13104","url_pdf":"https://arxiv.org/pdf/2508.13104v1","authors":"[\"Yuang Wang\",\"Chao Wen\",\"Haoyu Guo\",\"Sida Peng\",\"Minghan Qin\",\"Hujun Bao\",\"Xiaowei Zhou\",\"Ruizhen Hu\"]","published":"2025-08-18T17:12:28Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[]","has_code":false}
