{"ID":2826188,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.19219","arxiv_id":"2512.19219","title":"Selective LoRA for Visual Tokens and Attention Heads","abstract":"Low-rank adaptation (LoRA) is widely used for parameter-efficient fine-tuning, but its standard all-token, all-head design ignores the heterogeneous structure of vision language model (VLM) inputs. We introduce \\emph{Image-LoRA}, a vision-oriented PEFT recipe that views LoRA as a token-level residual update and applies this update only to visual tokens. Image-LoRA further restricts adaptation to the value path of a compact subset of attention heads, selected using a one-pass influence estimate from a rank-1 visual-token-only probe. This token-, head-, and value-selective design reduces trainable parameters and adapter-only training FLOPs while leaving the pure-text forward pass of the frozen backbone unchanged when no visual tokens are present. Across visual localization benchmarks with controlled text:image token ratios, Image-LoRA matches or closely approaches standard LoRA, while showing especially favorable trade-offs in image-token-heavy regimes. We further validate its generality on TextVQA and VideoQA, verify pure-text preservation on GSM8K, and show on ViLP that a stronger information bottleneck can yield gains over standard LoRA.","short_abstract":"Low-rank adaptation (LoRA) is widely used for parameter-efficient fine-tuning, but its standard all-token, all-head design ignores the heterogeneous structure of vision language model (VLM) inputs. We introduce \\emph{Image-LoRA}, a vision-oriented PEFT recipe that views LoRA as a token-level residual update and applies...","url_abs":"https://arxiv.org/abs/2512.19219","url_pdf":"https://arxiv.org/pdf/2512.19219v2","authors":"[\"Tiange Luo\",\"Lajanugen Logeswaran\",\"Jaekyeom Kim\",\"Justin Johnson\",\"Honglak Lee\"]","published":"2025-12-22T10:02:10Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}
