{"ID":2873398,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06579","arxiv_id":"2509.06579","title":"CausNVS: Autoregressive Multi-view Diffusion for Flexible 3D Novel View Synthesis","abstract":"Multi-view diffusion models have shown promise in 3D novel view synthesis, but most existing methods adopt a non-autoregressive formulation. This limits their applicability in world modeling, as they only support a fixed number of views and suffer from slow inference due to denoising all frames simultaneously. To address these limitations, we propose CausNVS, a multi-view diffusion model in an autoregressive setting, which supports arbitrary input-output view configurations and generates views sequentially. We train CausNVS with causal masking and per-frame noise, using pairwise-relative camera pose encodings (CaPE) for precise camera control. At inference time, we combine a spatially-aware sliding-window with key-value caching and noise conditioning augmentation to mitigate drift. Our experiments demonstrate that CausNVS supports a broad range of camera trajectories, enables flexible autoregressive novel view synthesis, and achieves consistently strong visual quality across diverse settings. Project page: https://kxhit.github.io/CausNVS.html.","short_abstract":"Multi-view diffusion models have shown promise in 3D novel view synthesis, but most existing methods adopt a non-autoregressive formulation. This limits their applicability in world modeling, as they only support a fixed number of views and suffer from slow inference due to denoising all frames simultaneously. To addre...","url_abs":"https://arxiv.org/abs/2509.06579","url_pdf":"https://arxiv.org/pdf/2509.06579v1","authors":"[\"Xin Kong\",\"Daniel Watson\",\"Yannick Strümpler\",\"Michael Niemeyer\",\"Federico Tombari\"]","published":"2025-09-08T11:49:51Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}