{"ID":2859520,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06208","arxiv_id":"2510.06208","title":"ShapeGen4D: Towards High Quality 4D Shape Generation from Videos","abstract":"Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our framework introduces three key components based on large-scale pre-trained 3D models: (i) a temporal attention that conditions generation on all frames while producing a time-indexed dynamic representation; (ii) a time-aware point sampling and 4D latent anchoring that promote temporally consistent geometry and texture; and (iii) noise sharing across frames to enhance temporal stability. Our method accurately captures non-rigid motion, volume changes, and even topological transitions without per-frame optimization. Across diverse in-the-wild videos, our method improves robustness and perceptual fidelity and reduces failure modes compared with the baselines.","short_abstract":"Video-conditioned 4D shape generation aims to recover time-varying 3D geometry and view-consistent appearance directly from an input video. In this work, we introduce a native video-to-4D shape generation framework that synthesizes a single dynamic 3D representation end-to-end from the video. Our framework introduces t...","url_abs":"https://arxiv.org/abs/2510.06208","url_pdf":"https://arxiv.org/pdf/2510.06208v1","authors":"[\"Jiraphon Yenphraphai\",\"Ashkan Mirzaei\",\"Jianqi Chen\",\"Jiaxu Zou\",\"Sergey Tulyakov\",\"Raymond A. Yeh\",\"Peter Wonka\",\"Chaoyang Wang\"]","published":"2025-10-07T17:58:11Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}