{"ID":3006112,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-04T19:14:31.964469513Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.02927","arxiv_id":"2606.02927","title":"SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks","abstract":"Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers. Motivated by this observation, we introduce SALU (Saturated Adaptive Linear Unit), \\[ \\operatorname{SALU}(x;a,b) = \\frac{a x}{\\sqrt{1 + a b x^2}},\\quad a\u003e0,\\; b\u003e0 \\] a bounded, learnable activation that provides intrinsic signal stabilization without relying on batch statistics or external affine parameters. Building on SALU, we propose SaluNet, a paradigm grounded in total plasticity: SALU replaces normalization layers, while SWALU and GALU replace standard activations. With ResNet-18, SaluNet-C-18 achieves 97.35\\% on CIFAR-10 and 83.25\\% on CIFAR-100 without normalization, maintaining 93.44\\% and 76.23\\% at batch size 1 where normalized architectures fail. For transformers, SaluNet-T improves over LayerNorm-GELU from 90.92\\% to 91.01\\% on CIFAR-10 and from 66.54\\% to 68.10\\% on CIFAR-100. SaluNet-C-50 reaches 78.67\\% Top-1 on ImageNet-1K at $224\\times224$, and $79.23\\%$ at $288\\times288$. These results suggest normalization layers suppress total plasticity, a property biological neurons inherently possess, enabling deep networks to learn effectively.","short_abstract":"Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activ...","url_abs":"https://arxiv.org/abs/2606.02927","url_pdf":"https://arxiv.org/pdf/2606.02927v1","authors":"[\"Mourad Zaied\"]","published":"2026-06-01T22:09:06Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}
