{"ID":2836569,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.21690","arxiv_id":"2511.21690","title":"TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos","abstract":"Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data problem by introducing a unifying, symbolic representation - a compact 3D \"trace-space\" of scene-level trajectories - that enables learning from cross-embodiment, cross-environment, and cross-task videos. We present TraceGen, a world model that predicts future motion in trace-space rather than pixel space, abstracting away appearance while retaining the geometric structure needed for manipulation. To train TraceGen at scale, we develop TraceForge, a data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces, yielding a corpus of 123K videos and 1.8M observation-trace-language triplets. Pretraining on this corpus produces a transferable 3D motion prior that adapts efficiently: with just five target robot videos, TraceGen attains 80% success across four tasks while offering 50-600x faster inference than state-of-the-art video-based world models. In the more challenging case where only five uncalibrated human demonstration videos captured on a handheld phone are available, it still reaches 67.5% success on a real robot, highlighting TraceGen's ability to adapt across embodiments without relying on object detectors or heavy pixel-space generation.","short_abstract":"Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data problem by introduc...","url_abs":"https://arxiv.org/abs/2511.21690","url_pdf":"https://arxiv.org/pdf/2511.21690v1","authors":"[\"Seungjae Lee\",\"Yoonkyo Jung\",\"Inkook Chun\",\"Yao-Chih Lee\",\"Zikui Cai\",\"Hongjia Huang\",\"Aayush Talreja\",\"Tan Dat Dao\",\"Yongyuan Liang\",\"Jia-Bin Huang\",\"Furong Huang\"]","published":"2025-11-26T18:59:55Z","proceeding":"cs.RO","tasks":"[\"cs.RO\",\"cs.CV\",\"cs.LG\"]","methods":"[]","has_code":false}
