{"ID":2880532,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13534","arxiv_id":"2508.13534","title":"MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence","abstract":"Imitating tool manipulation from human videos offers an intuitive approach to teaching robots, while also providing a promising and scalable alternative to labor-intensive teleoperation data collection for visuomotor policy learning. While humans can mimic tool manipulation behavior by observing others perform a task just once and effortlessly transfer the skill to diverse tools for functionally equivalent tasks, current robots struggle to achieve this level of generalization. A key challenge lies in establishing function-level correspondences, considering the significant geometric variations among functionally similar tools, referred to as intra-function variations. To address this challenge, we propose MimicFunc, a framework that establishes functional correspondences with function frame, a function-centric local coordinate frame constructed with keypoint-based abstraction, for imitating tool manipulation skills. Experiments demonstrate that MimicFunc effectively enables the robot to generalize the skill from a single RGB-D human video to manipulating novel tools for functionally equivalent tasks. Furthermore, leveraging MimicFunc's one-shot generalization capability, the generated rollouts can be used to train visuomotor policies without requiring labor-intensive teleoperation data collection for novel objects. Our code and video are available at https://sites.google.com/view/mimicfunc.","short_abstract":"Imitating tool manipulation from human videos offers an intuitive approach to teaching robots, while also providing a promising and scalable alternative to labor-intensive teleoperation data collection for visuomotor policy learning. While humans can mimic tool manipulation behavior by observing others perform a task j...","url_abs":"https://arxiv.org/abs/2508.13534","url_pdf":"https://arxiv.org/pdf/2508.13534v1","authors":"[\"Chao Tang\",\"Anxing Xiao\",\"Yuhong Deng\",\"Tianrun Hu\",\"Wenlong Dong\",\"Hanbo Zhang\",\"David Hsu\",\"Hong Zhang\"]","published":"2025-08-19T05:49:47Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.AI\",\"cs.CV\"]","methods":"[]","project_urls":"[\"https://sites.google.com/view/mimicfunc\"]","has_code":false,"code_links":[{"ID":610683,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880532,"paper_url":"https://arxiv.org/abs/2508.13534","paper_title":"MimicFunc: Imitating Tool Manipulation from a Single Human Video via Functional Correspondence","repo_url":"https://github.com/google/safevalues","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}