{"ID":2866242,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21637","arxiv_id":"2509.21637","title":"BoHA: Blockwise Hadamard Product Adaptation for Parameter-Efficient Fine-Tuning","abstract":"Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the same parameter budget assess single-task accuracy. In sequential adaptation settings, such evaluations should also measure how well performance on the first-stage task is retained after subsequent fine-tuning. To address this gap, we introduce BoHA, a blockwise $W_0$-coupled Hadamard product adapter that treats spatial support as an explicit design axis. BoHA partitions the frozen weight $W_0$ into a $b{\\times}b$ grid and learns an independent low-rank Hadamard product factor in each block, preserving a matched LoRA-equivalent total rank with adapter-free merged inference. On a synthetic target, BoHA at per-block rank $r_b{=}1$ exactly reconstructs an update that requires rank $b^2$ under the global $W_0$-coupled Hadamard parameterization. Across Llama-3.2-1B/3B, Mistral-7B, and Gemma-2-9B on commonsense and arithmetic reasoning tasks, BoHA outperforms LoRA across all matched-budget single-task averages and remains competitive with the strongest Hadamard baseline. On a Llama-3.2-3B commonsense $\\to$ arithmetic continual-learning diagnostic, BoHA retains $57.66\\%$ first-stage accuracy and exceeds the $W_0$-free additive-control mean by $15.23\\%$ under matched second-stage plasticity. These results demonstrate that blockwise $W_0$-coupled Hadamard adaptation is a competitive PEFT design choice when retention under sequential adaptation is part of the objective.","short_abstract":"Parameter-efficient fine-tuning (PEFT) of large language models trains a small task-specific parameter set while keeping the pretrained model frozen. The dominant Low-Rank Adaptation (LoRA) family makes this trade-off practical; however, evaluations under the same parameter budget assess single-task accuracy. In sequen...","url_abs":"https://arxiv.org/abs/2509.21637","url_pdf":"https://arxiv.org/pdf/2509.21637v2","authors":"[\"Feng Yu\",\"Jia Hu\",\"Geyong Min\"]","published":"2025-09-25T21:54:09Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}