{"ID":2882489,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.10897","arxiv_id":"2508.10897","title":"Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning","abstract":"This paper aims to model 3D human motion across domains, where a single model is expected to handle multiple modalities, tasks, and datasets. Existing cross-domain models often rely on domain-specific components and multi-stage training, which limits their practicality and scalability. To overcome these challenges, we propose a new setting to train a unified cross-domain model through a single process, eliminating the need for domain-specific components and multi-stage training. We first introduce Pose-in-Context (PiC), which leverages in-context learning to create a pose-centric cross-domain model. While PiC generalizes across multiple pose-based tasks and datasets, it encounters difficulties with modality diversity, prompting strategy, and contextual dependency handling. We thus propose Human-in-Context (HiC), an extension of PiC that broadens generalization across modalities, tasks, and datasets. HiC combines pose and mesh representations within a unified framework, expands task coverage, and incorporates larger-scale datasets. Additionally, HiC introduces a max-min similarity prompt sampling strategy to enhance generalization across diverse domains and a network architecture with dual-branch context injection for improved handling of contextual dependencies. Extensive experimental results show that HiC performs better than PiC in terms of generalization, data scale, and performance across a wide range of domains. These results demonstrate the potential of HiC for building a unified cross-domain 3D human motion model with improved flexibility and scalability. The source codes and models are available at https://github.com/BradleyWang0416/Human-in-Context.","short_abstract":"This paper aims to model 3D human motion across domains, where a single model is expected to handle multiple modalities, tasks, and datasets. Existing cross-domain models often rely on domain-specific components and multi-stage training, which limits their practicality and scalability. To overcome these challenges, we...","url_abs":"https://arxiv.org/abs/2508.10897","url_pdf":"https://arxiv.org/pdf/2508.10897v1","authors":"[\"Mengyuan Liu\",\"Xinshun Wang\",\"Zhongbin Fang\",\"Deheng Ye\",\"Xia Li\",\"Tao Tang\",\"Songtao Wu\",\"Xiangtai Li\",\"Ming-Hsuan Yang\"]","published":"2025-08-14T17:59:23Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":610898,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2882489,"paper_url":"https://arxiv.org/abs/2508.10897","paper_title":"Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning","repo_url":"https://github.com/BradleyWang0416/Human-in-Context","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
