{"ID":2824602,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22854","arxiv_id":"2512.22854","title":"ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning","abstract":"Human-object interaction (HOI) video generation has garnered increasing attention due to its promising applications in digital humans, e-commerce, advertising, and robotics imitation learning. However, existing methods face two critical limitations: (1) a lack of effective mechanisms to inject multi-view information of the object into the model, leading to poor cross-view consistency, and (2) heavy reliance on fine-grained hand mesh annotations for modeling interaction occlusions. To address these challenges, we introduce ByteLoom, a Diffusion Transformer (DiT)-based framework that generates realistic HOI videos with geometrically consistent object illustration, using simplified human conditioning and 3D object inputs. We first propose an RCM-cache mechanism that leverages Relative Coordinate Maps (RCM) as a universal representation to maintain object's geometry consistency and precisely control 6-DoF object transformations in the meantime. To compensate HOI dataset scarcity and leverage existing datasets, we further design a training curriculum that enhances model capabilities in a progressive style and relaxes the demand of hand mesh. Extensive experiments demonstrate that our method faithfully preserves human identity and the object's multi-view geometry, while maintaining smooth motion and object manipulation.","short_abstract":"Human-object interaction (HOI) video generation has garnered increasing attention due to its promising applications in digital humans, e-commerce, advertising, and robotics imitation learning. However, existing methods face two critical limitations: (1) a lack of effective mechanisms to inject multi-view information of...","url_abs":"https://arxiv.org/abs/2512.22854","url_pdf":"https://arxiv.org/pdf/2512.22854v2","authors":"[\"Bangya Liu\",\"Xinyu Gong\",\"Zelin Zhao\",\"Ziyang Song\",\"Yulei Lu\",\"Suhui Wu\",\"Jun Zhang\",\"Suman Banerjee\",\"Hao Zhang\"]","published":"2025-12-28T09:38:36Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.GR\",\"cs.LG\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
