{"ID":2883463,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.09383","arxiv_id":"2508.09383","title":"X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents","abstract":"We present X-UniMotion, a unified and expressive implicit latent representation for whole-body human motion, encompassing facial expressions, body poses, and hand gestures. Unlike prior motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, our approach encodes multi-granular motion directly from a single image into a compact set of four disentangled latent tokens -- one for facial expression, one for body pose, and one for each hand. These motion latents are both highly expressive and identity-agnostic, enabling high-fidelity, detailed cross-identity motion transfer across subjects with diverse identities, poses, and spatial configurations. To achieve this, we introduce a self-supervised, end-to-end framework that jointly learns the motion encoder and latent representation alongside a DiT-based video generative model, trained on large-scale, diverse human motion datasets. Motion-identity disentanglement is enforced via 2D spatial and color augmentations, as well as synthetic 3D renderings of cross-identity subject pairs under shared poses. Furthermore, we guide motion token learning with auxiliary decoders that promote fine-grained, semantically aligned, and depth-aware motion embeddings. Extensive experiments show that X-UniMotion outperforms state-of-the-art methods, producing highly expressive animations with superior motion fidelity and identity preservation.","short_abstract":"We present X-UniMotion, a unified and expressive implicit latent representation for whole-body human motion, encompassing facial expressions, body poses, and hand gestures. Unlike prior motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, our approach encodes multi-gran...","url_abs":"https://arxiv.org/abs/2508.09383","url_pdf":"https://arxiv.org/pdf/2508.09383v1","authors":"[\"Guoxian Song\",\"Hongyi Xu\",\"Xiaochen Zhao\",\"You Xie\",\"Tianpei Gu\",\"Zenan Li\",\"Chenxu Zhang\",\"Linjie Luo\"]","published":"2025-08-12T22:47:20Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false}