{"ID":2831438,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.07062","arxiv_id":"2512.07062","title":"$\\mathrm{D}^\\mathrm{3}$-Predictor: Noise-Free Deterministic Diffusion for Dense Prediction","abstract":"Although diffusion models with strong visual priors have emerged as powerful dense prediction backbones, they overlook a core limitation: the stochastic noise at the core of diffusion sampling is inherently misaligned with dense prediction that requires a deterministic mapping from image to geometry. In this paper, we show that this stochastic noise corrupts fine-grained spatial cues and pushes the model toward timestep-specific noise objectives, consequently destroying meaningful geometric structure mappings. To address this, we introduce $\\mathrm{D}^\\mathrm{3}$-Predictor, a noise-free deterministic diffusion-based dense prediction model built by reformulating a pretrained diffusion model without stochasticity noise. Instead of relying on noisy inputs to leverage diffusion priors, $\\mathrm{D}^\\mathrm{3}$-Predictor views the pretrained diffusion network as an ensemble of timestep-dependent visual experts and self-supervisedly aggregates their heterogeneous priors into a single, clean, and complete geometric prior. Meanwhile, we utilize task-specific supervision to seamlessly adapt this noise-free prior to dense prediction tasks. Extensive experiments on various dense prediction tasks demonstrate that $\\mathrm{D}^\\mathrm{3}$-Predictor achieves competitive or state-of-the-art performance in diverse scenarios. In addition, it requires less than half the training data previously used and efficiently performs inference in a single step. Our code, data, and checkpoints are publicly available at https://x-gengroup.github.io/HomePage_D3-Predictor/.","short_abstract":"Although diffusion models with strong visual priors have emerged as powerful dense prediction backbones, they overlook a core limitation: the stochastic noise at the core of diffusion sampling is inherently misaligned with dense prediction that requires a deterministic mapping from image to geometry. In this paper, we...","url_abs":"https://arxiv.org/abs/2512.07062","url_pdf":"https://arxiv.org/pdf/2512.07062v4","authors":"[\"Changliang Xia\",\"Chengyou Jia\",\"Minnan Luo\",\"Zhuohang Dang\",\"Xin Shen\",\"Bowen Ping\"]","published":"2025-12-08T00:39:32Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Diffusion Model\"]","has_code":false}
