{"ID":2860371,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04295","arxiv_id":"2510.04295","title":"Hypernetwork-Driven Low-Rank Adaptation Across Attention Heads","abstract":"Parameter-efficient fine-tuning (PEFT) has emerged as a powerful paradigm for adapting large-scale pre-trained models to downstream tasks with minimal additional parameters. Among PEFT methods, Low-Rank Adaptation (LoRA) stands out for its effectiveness by inserting trainable low-rank matrices into weight updates to enable efficient adaptation. However, when applied to multi-head self-attention, existing LoRA-based methods typically fine-tune each attention head independently, overlooking potential interactions and shared structure among heads. To address this limitation, we propose Hypernetwork-Driven Low-rank Adaptation (HyRA) that employs a hypernetwork to generate joint low-rank matrices for all attention heads within a layer. The shared generator promotes cross-head information sharing, helping low-rank modules avoid the redundant feature learning seen in traditional LoRA methods. Theoretically, our method achieves significantly better sample efficiency compared to standard LoRA. Empirically, we evaluate HyRA on a comprehensive suite of language and vision benchmarks. Our approach consistently outperforms existing parameter-efficient fine-tuning (PEFT) baselines across a wide range of tasks. Notably, in low-data regimes, HyRA achieves substantial improvements over LoRA, underscoring its practical sample efficiency and effectiveness in data-scarce scenarios.","short_abstract":"Parameter-efficient fine-tuning (PEFT) has emerged as a powerful paradigm for adapting large-scale pre-trained models to downstream tasks with minimal additional parameters. Among PEFT methods, Low-Rank Adaptation (LoRA) stands out for its effectiveness by inserting trainable low-rank matrices into weight updates to en...","url_abs":"https://arxiv.org/abs/2510.04295","url_pdf":"https://arxiv.org/pdf/2510.04295v2","authors":"[\"Nghiem T. Diep\",\"Dung Le\",\"Tuan Truong\",\"Tan Dinh\",\"Huy Nguyen\",\"Nhat Ho\"]","published":"2025-10-05T17:13:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"LoRA\"]","has_code":false}