{"ID":2890174,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.19946","arxiv_id":"2507.19946","title":"SCALAR: Scale-wise Controllable Visual Autoregressive Learning","abstract":"Controllable image synthesis, which enables fine-grained control over generated outputs, has emerged as a key focus in visual generative modeling. However, controllable generation remains challenging for Visual Autoregressive (VAR) models due to their hierarchical, next-scale prediction style. Existing VAR-based methods often suffer from inefficient control encoding and disruptive injection mechanisms that compromise both fidelity and efficiency. In this work, we present SCALAR, a controllable generation method based on VAR, incorporating a novel Scale-wise Conditional Decoding mechanism. SCALAR leverages a pretrained image encoder to extract semantic control signal encodings, which are projected into scale-specific representations and injected into the corresponding layers of the VAR backbone. This design provides persistent and structurally aligned guidance throughout the generation process. Building on SCALAR, we develop SCALAR-Uni, a unified extension that aligns multiple control modalities into a shared latent space, supporting flexible multi-conditional guidance in a single model. Extensive experiments show that SCALAR achieves superior generation quality and control precision across various tasks. The code is released at https://github.com/AMAP-ML/SCALAR.","short_abstract":"Controllable image synthesis, which enables fine-grained control over generated outputs, has emerged as a key focus in visual generative modeling. However, controllable generation remains challenging for Visual Autoregressive (VAR) models due to their hierarchical, next-scale prediction style. Existing VAR-based method...","url_abs":"https://arxiv.org/abs/2507.19946","url_pdf":"https://arxiv.org/pdf/2507.19946v3","authors":"[\"Ryan Xu\",\"Dongyang Jin\",\"Yancheng Bai\",\"Rui Lan\",\"Xu Duan\",\"Lei Sun\",\"Xiangxiang Chu\"]","published":"2025-07-26T13:23:08Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":611749,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2890174,"paper_url":"https://arxiv.org/abs/2507.19946","paper_title":"SCALAR: Scale-wise Controllable Visual Autoregressive Learning","repo_url":"https://github.com/AMAP-ML/SCALAR","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}