{"ID":2894140,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.12646","arxiv_id":"2507.12646","title":"Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos","abstract":"We explore novel-view synthesis for dynamic scenes from monocular videos. Prior approaches rely on costly test-time optimization of 4D representations or do not preserve scene geometry when trained in a feed-forward manner. Our approach is based on three key insights: (1) covisible pixels (that are visible in both the input and target views) can be rendered by first reconstructing the dynamic 3D scene and rendering the reconstruction from the novel-views and (2) hidden pixels in novel views can be \"inpainted\" with feed-forward 2D video diffusion models. Notably, our video inpainting diffusion model (CogNVS) can be self-supervised from 2D videos, allowing us to train it on a large corpus of in-the-wild videos. This in turn allows for (3) CogNVS to be applied zero-shot to novel test videos via test-time finetuning. We empirically verify that CogNVS outperforms almost all prior art for novel-view synthesis of dynamic scenes from monocular videos.","short_abstract":"We explore novel-view synthesis for dynamic scenes from monocular videos. Prior approaches rely on costly test-time optimization of 4D representations or do not preserve scene geometry when trained in a feed-forward manner. Our approach is based on three key insights: (1) covisible pixels (that are visible in both the...","url_abs":"https://arxiv.org/abs/2507.12646","url_pdf":"https://arxiv.org/pdf/2507.12646v2","authors":"[\"Kaihua Chen\",\"Tarasha Khurana\",\"Deva Ramanan\"]","published":"2025-07-16T21:40:29Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}