{"ID":2859170,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05610","arxiv_id":"2510.05610","title":"Efficient Conditional Generation on Scale-based Visual Autoregressive Models","abstract":"Recent advances in autoregressive (AR) models have demonstrated their potential to rival diffusion models in image synthesis. However, for complex spatially-conditioned generation, current AR approaches rely on fine-tuning the pre-trained model, leading to significant training costs. In this paper, we propose the Efficient Control Model (ECM), a plug-and-play framework featuring a lightweight control module that introduces control signals via a distributed architecture. This architecture consists of context-aware attention layers that refine conditional features using real-time generated tokens, and a shared gated feed-forward network (FFN) designed to maximize the utilization of its limited capacity and ensure coherent control feature learning. Furthermore, recognizing the critical role of early-stage generation in determining semantic structure, we introduce an early-centric sampling strategy that prioritizes learning early control sequences. This approach reduces computational cost by lowering the number of training tokens per iteration, while a complementary temperature scheduling during inference compensates for the resulting insufficient training of late-stage tokens. Extensive experiments on scale-based AR models validate that our method achieves high-fidelity and diverse control over image generation, surpassing existing baselines while significantly improving both training and inference efficiency.","short_abstract":"Recent advances in autoregressive (AR) models have demonstrated their potential to rival diffusion models in image synthesis. However, for complex spatially-conditioned generation, current AR approaches rely on fine-tuning the pre-trained model, leading to significant training costs. In this paper, we propose the Effic...","url_abs":"https://arxiv.org/abs/2510.05610","url_pdf":"https://arxiv.org/pdf/2510.05610v1","authors":"[\"Jiaqi Liu\",\"Tao Huang\",\"Chang Xu\"]","published":"2025-10-07T06:27:03Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false}
