{"ID":2871077,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12201","arxiv_id":"2509.12201","title":"OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling","abstract":"The field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning. However, the development of truly general 4D world models remains fundamentally constrained by the availability of high-quality data. Existing datasets and benchmarks often lack the dynamic complexity, multi-domain diversity, and spatial-temporal annotations required to support key tasks such as 4D geometric reconstruction, future prediction, and camera-control video generation. To address this gap, we introduce OmniWorld, a large-scale, multi-domain, multi-modal dataset specifically designed for 4D world modeling. OmniWorld consists of a newly collected OmniWorld-Game dataset and several curated public datasets spanning diverse domains. Compared with existing synthetic datasets, OmniWorld-Game provides richer modality coverage, larger scale, and more realistic dynamic interactions. Based on this dataset, we establish a challenging benchmark that exposes the limitations of current state-of-the-art (SOTA) approaches in modeling complex 4D environments. Moreover, fine-tuning existing SOTA methods on OmniWorld leads to significant performance gains across 4D reconstruction and video generation tasks, strongly validating OmniWorld as a powerful resource for training and evaluation. We envision OmniWorld as a catalyst for accelerating the development of general-purpose 4D world models, ultimately advancing machines' holistic understanding of the physical world.","short_abstract":"The field of 4D world modeling - aiming to jointly capture spatial geometry and temporal dynamics - has witnessed remarkable progress in recent years, driven by advances in large-scale generative models and multimodal learning. However, the development of truly general 4D world models remains fundamentally constrained...","url_abs":"https://arxiv.org/abs/2509.12201","url_pdf":"https://arxiv.org/pdf/2509.12201v2","authors":"[\"Yang Zhou\",\"Yifan Wang\",\"Jianjun Zhou\",\"Wenzheng Chang\",\"Haoyu Guo\",\"Zizun Li\",\"Kaijing Ma\",\"Xinyue Li\",\"Yating Wang\",\"Haoyi Zhu\",\"Mingyu Liu\",\"Dingning Liu\",\"Jiange Yang\",\"Zhoujie Fu\",\"Junyi Chen\",\"Chunhua Shen\",\"Jiangmiao Pang\",\"Kaipeng Zhang\",\"Tong He\"]","published":"2025-09-15T17:59:19Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
