{"ID":2829091,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.13604","arxiv_id":"2512.13604","title":"LongVie 2: Multimodal Controllable Ultra-Long Video World Model","abstract":"Building video world models upon pretrained video generation systems represents an important yet challenging step toward general spatiotemporal intelligence. A world model should possess three essential properties: controllability, long-term visual quality, and temporal consistency. To this end, we take a progressive approach-first enhancing controllability and then extending toward long-term, high-quality generation. We present LongVie 2, an end-to-end autoregressive framework trained in three stages: (1) Multi-modal guidance, which integrates dense and sparse control signals to provide implicit world-level supervision and improve controllability; (2) Degradation-aware training on the input frame, bridging the gap between training and long-term inference to maintain high visual quality; and (3) History-context guidance, which aligns contextual information across adjacent clips to ensure temporal consistency. We further introduce LongVGenBench, a comprehensive benchmark comprising 100 high-resolution one-minute videos covering diverse real-world and synthetic environments. Extensive experiments demonstrate that LongVie 2 achieves state-of-the-art performance in long-range controllability, temporal coherence, and visual fidelity, and supports continuous video generation lasting up to five minutes, marking a significant step toward unified video world modeling.","short_abstract":"Building video world models upon pretrained video generation systems represents an important yet challenging step toward general spatiotemporal intelligence. A world model should possess three essential properties: controllability, long-term visual quality, and temporal consistency. To this end, we take a progressive a...","url_abs":"https://arxiv.org/abs/2512.13604","url_pdf":"https://arxiv.org/pdf/2512.13604v1","authors":"[\"Jianxiong Gao\",\"Zhaoxi Chen\",\"Xian Liu\",\"Junhao Zhuang\",\"Chengming Xu\",\"Jianfeng Feng\",\"Yu Qiao\",\"Yanwei Fu\",\"Chenyang Si\",\"Ziwei Liu\"]","published":"2025-12-15T17:59:58Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
