{"ID":2824858,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22653","arxiv_id":"2512.22653","title":"Visual Autoregressive Modelling for Monocular Depth Estimation","abstract":"We propose a monocular depth estimation method based on visual autoregressive (VAR) priors, offering an alternative to diffusion-based approaches. Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism with classifier-free guidance. Our approach performs inference in ten fixed autoregressive stages, requiring only 74K synthetic samples for fine-tuning, and achieves competitive results. We report state-of-the-art performance in indoor benchmarks under constrained training conditions, and strong performance when applied to outdoor datasets. This work establishes autoregressive priors as a complementary family of geometry-aware generative models for depth estimation, highlighting advantages in data scalability, and adaptability to 3D vision tasks. Code available at \"https://github.com/AmirMaEl/VAR-Depth\".","short_abstract":"We propose a monocular depth estimation method based on visual autoregressive (VAR) priors, offering an alternative to diffusion-based approaches. Our method adapts a large-scale text-to-image VAR model and introduces a scale-wise conditional upsampling mechanism with classifier-free guidance. Our approach performs inf...","url_abs":"https://arxiv.org/abs/2512.22653","url_pdf":"https://arxiv.org/pdf/2512.22653v1","authors":"[\"Amir El-Ghoussani\",\"André Kaup\",\"Nassir Navab\",\"Gustavo Carneiro\",\"Vasileios Belagiannis\"]","published":"2025-12-27T17:08:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":605617,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2824858,"paper_url":"https://arxiv.org/abs/2512.22653","paper_title":"Visual Autoregressive Modelling for Monocular Depth Estimation","repo_url":"https://github.com/AmirMaEl/VAR-Depth","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
