{"ID":2832850,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.04515","arxiv_id":"2512.04515","title":"EgoLCD: Egocentric Video Generation with Long Context Diffusion","abstract":"Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-end framework for egocentric long-context video generation that treats long video synthesis as a problem of efficient and stable memory management. EgoLCD combines a Long-Term Sparse KV Cache for stable global context with an attention-based short-term memory, extended by LoRA for local adaptation. A Memory Regulation Loss enforces consistent memory usage, and Structured Narrative Prompting provides explicit temporal guidance. Extensive experiments on the EgoVid-5M benchmark demonstrate that EgoLCD achieves state-of-the-art performance in both perceptual quality and temporal consistency, effectively mitigating generative forgetting and representing a significant step toward building scalable world models for embodied AI. Code: https://github.com/AIGeeksGroup/EgoLCD. Website: https://aigeeksgroup.github.io/EgoLCD.","short_abstract":"Generating long, coherent egocentric videos is difficult, as hand-object interactions and procedural tasks require reliable long-term memory. Existing autoregressive models suffer from content drift, where object identity and scene semantics degrade over time. To address this challenge, we introduce EgoLCD, an end-to-e...","url_abs":"https://arxiv.org/abs/2512.04515","url_pdf":"https://arxiv.org/pdf/2512.04515v1","authors":"[\"Liuzhou Zhang\",\"Jiarui Ye\",\"Yuanlei Wang\",\"Ming Zhong\",\"Mingju Cao\",\"Wanke Xia\",\"Bowen Zeng\",\"Zeyu Zhang\",\"Hao Tang\"]","published":"2025-12-04T06:53:01Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":606281,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2832850,"paper_url":"https://arxiv.org/abs/2512.04515","paper_title":"EgoLCD: Egocentric Video Generation with Long Context Diffusion","repo_url":"https://github.com/AIGeeksGroup/EgoLCD","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
