{"ID":2870535,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.13240","arxiv_id":"2509.13240","title":"Don't Forget the Nonlinearity: Unlocking Activation Functions in Efficient Fine-Tuning","abstract":"Existing parameter-efficient fine-tuning (PEFT) methods primarily adapt weight matrices while keeping activation functions fixed. We introduce \\textbf{NoRA}, the first PEFT framework that directly adapts nonlinear activation functions in pretrained transformer-based models. NoRA replaces fixed activations with learnable rational functions and applies structured low-rank updates to numerator and denominator coefficients, with a group-wise design that localizes adaptation and improves stability at minimal cost. On vision transformers trained on CIFAR-10 and CIFAR-100, NoRA matches or exceeds full fine-tuning while updating only 0.4\\% of parameters (0.02M), achieving accuracy gains of +0.17\\% and +0.27\\%. When combined with LoRA (\\textbf{NoRA++}), it outperforms LoRA and DoRA under matched training budgets by adding fewer trainable parameters. On LLaMA3-8B instruction tuning, NoRA++ consistently improves generation quality, yielding average MMLU gains of +0.3\\%--0.8\\%, including +1.6\\% on STEM (Alpaca) and +1.3\\% on OpenOrca. We further show that NoRA constrains adaptation to a low-dimensional functional subspace, implicitly regularizing update magnitude and direction. These results establish activation-space tuning as a complementary and highly parameter-efficient alternative to weight-based PEFT, positioning activation functions as first-class objects for model adaptation.","short_abstract":"Existing parameter-efficient fine-tuning (PEFT) methods primarily adapt weight matrices while keeping activation functions fixed. We introduce \\textbf{NoRA}, the first PEFT framework that directly adapts nonlinear activation functions in pretrained transformer-based models. NoRA replaces fixed activations with learnabl...","url_abs":"https://arxiv.org/abs/2509.13240","url_pdf":"https://arxiv.org/pdf/2509.13240v2","authors":"[\"Bo Yin\",\"Xingyi Yang\",\"Xinchao Wang\"]","published":"2025-09-16T16:47:03Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Vision Transformer\",\"Transformer\",\"LoRA\"]","has_code":false}
