{"ID":2824925,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21857","arxiv_id":"2512.21857","title":"Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees","abstract":"Autoregressive (AR) image models achieve diffusion-level quality but suffer from sequential inference, requiring approximately 2,000 steps for a 576x576 image. Speculative decoding with draft trees accelerates LLMs yet underperforms on visual AR models due to spatially varying token prediction difficulty. We identify a key obstacle in applying speculative decoding to visual AR models: inconsistent acceptance rates across draft trees due to varying prediction difficulties in different image regions. We propose Adjacency-Adaptive Dynamical Draft Trees (ADT-Tree), an adjacency-adaptive dynamic draft tree that dynamically adjusts draft tree depth and width by leveraging adjacent token states and prior acceptance rates. ADT-Tree initializes via horizontal adjacency, then refines depth/width via bisectional adaptation, yielding deeper trees in simple regions and wider trees in complex ones. The empirical evaluations on MS-COCO 2017 and PartiPrompts demonstrate that ADT-Tree achieves speedups of 3.13xand 3.05x, respectively. Moreover, it integrates seamlessly with relaxed sampling methods such as LANTERN, enabling further acceleration. Code is available at https://github.com/Haodong-Lei-Ray/ADT-Tree.","short_abstract":"Autoregressive (AR) image models achieve diffusion-level quality but suffer from sequential inference, requiring approximately 2,000 steps for a 576x576 image. Speculative decoding with draft trees accelerates LLMs yet underperforms on visual AR models due to spatially varying token prediction difficulty. We identify a...","url_abs":"https://arxiv.org/abs/2512.21857","url_pdf":"https://arxiv.org/pdf/2512.21857v1","authors":"[\"Haodong Lei\",\"Hongsong Wang\",\"Xin Geng\",\"Liang Wang\",\"Pan Zhou\"]","published":"2025-12-26T04:45:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":605621,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2824925,"paper_url":"https://arxiv.org/abs/2512.21857","paper_title":"Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees","repo_url":"https://github.com/Haodong-Lei-Ray/ADT-Tree","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
