{"ID":2838349,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.18173","arxiv_id":"2511.18173","title":"EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses","abstract":"Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propose EgoControl, a pose-controllable video diffusion model trained on egocentric data. We train a video prediction model to condition future frame generation on explicit 3D body pose sequences. To achieve precise motion control, we introduce a novel pose representation that captures both global camera dynamics and articulated body movements, and integrate it through a dedicated control mechanism within the diffusion process. Given a short sequence of observed frames and a sequence of target poses, EgoControl generates temporally coherent and visually realistic future frames that align with the provided pose control. Experimental results demonstrate that EgoControl produces high-quality, pose-consistent egocentric videos, paving the way toward controllable embodied video simulation and understanding.","short_abstract":"Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propose EgoControl, a pose-controllable video diffusion model trained on egocentric data. We train a video prediction model to conditio...","url_abs":"https://arxiv.org/abs/2511.18173","url_pdf":"https://arxiv.org/pdf/2511.18173v1","authors":"[\"Enrico Pallotta\",\"Sina Mokhtarzadeh Azar\",\"Lars Doorenbos\",\"Serdar Ozsoy\",\"Umar Iqbal\",\"Juergen Gall\"]","published":"2025-11-22T19:56:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}