{"ID":2885049,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.05091","arxiv_id":"2508.05091","title":"PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation","abstract":"Generating temporally coherent, long-duration videos with precise control over subject identity and movement remains a fundamental challenge for contemporary diffusion-based models, which often suffer from identity drift and are limited to short video length. We present PoseGen, a novel framework that generates human videos of extended duration from a single reference image and a driving video. Our contributions include an in-context LoRA finetuning design that injects subject appearance at the token level for identity preservation, while simultaneously conditioning on pose information at the channel level for fine-grained motion control. To overcome duration limits, we introduce a segment-interleaved generation strategy, where non-overlapping segments are first generated with improved background consistency through a shared KV-cache mechanism, and then stitched into a continuous sequence via pose-aware interpolated generation. Despite being trained on a remarkably small 33-hour video dataset, PoseGen demonstrates superior performance over state-of-the-art baselines in identity fidelity, pose accuracy, and temporal consistency. Code is available at https://github.com/Jessie459/PoseGen .","short_abstract":"Generating temporally coherent, long-duration videos with precise control over subject identity and movement remains a fundamental challenge for contemporary diffusion-based models, which often suffer from identity drift and are limited to short video length. We present PoseGen, a novel framework that generates human v...","url_abs":"https://arxiv.org/abs/2508.05091","url_pdf":"https://arxiv.org/pdf/2508.05091v2","authors":"[\"Jingxuan He\",\"Busheng Su\",\"Finn Wong\"]","published":"2025-08-07T07:19:02Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":611146,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2885049,"paper_url":"https://arxiv.org/abs/2508.05091","paper_title":"PoseGen: In-Context LoRA Finetuning for Pose-Controllable Long Human Video Generation","repo_url":"https://github.com/Jessie459/PoseGen","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
