{"ID":2839941,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14282","arxiv_id":"2511.14282","title":"Weight Concentration Regularization for Improving Pruning Robustness Under High Sparsity","abstract":"Deep neural networks achieve outstanding performance across vision and language tasks, yet their large parameter counts limit deployment in resource-constrained settings. One-shot pruning reduces model size without retraining, but models trained with standard objectives often suffer substantial accuracy drops under aggressive sparsity. Prior work mitigates this drop along two directions: regularizers such as $\\ell_1$ and DeepHoyer that shape the weight distribution during training, and pruning-robust optimizers such as SAM, CrAM, and S$^2$SAM that flatten the loss landscape. However, existing regularizers either shrink all weights uniformly ($\\ell_1$) or induce scale-invariant sparsity (DeepHoyer), without concentrating weight energy onto a small set of informative parameters. We propose a Weight Concentration Regularizer (WCR), a training-time regularizer that amplifies the magnitude of a small subset of parameters while driving the remainder toward zero, so that magnitude pruning predominantly removes parameters with negligible functional contribution. We provide a convergence analysis and evaluate WCR on LLM fine-tuning, image classification, and medical segmentation, demonstrating consistent improvements in pruning robustness across architectures and compatibility with existing pruning-robust optimizers.","short_abstract":"Deep neural networks achieve outstanding performance across vision and language tasks, yet their large parameter counts limit deployment in resource-constrained settings. One-shot pruning reduces model size without retraining, but models trained with standard objectives often suffer substantial accuracy drops under agg...","url_abs":"https://arxiv.org/abs/2511.14282","url_pdf":"https://arxiv.org/pdf/2511.14282v2","authors":"[\"Vincent-Daniel Yun\",\"Junhyuk Jo\",\"Sunwoo Lee\"]","published":"2025-11-18T09:18:26Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
