{"ID":2890823,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.18212","arxiv_id":"2507.18212","title":"Prune\u0026Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation","abstract":"Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To address this issue, we propose Prune\u0026Comp, a novel plug-and-play layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap caused by layer removal and then eliminate this gap by rescaling the remaining weights offline, with zero runtime overhead incurred. We further demonstrate the advantages of Prune\u0026Comp through an iterative pruning strategy. When integrated with an iterative prune-and-compensate loop, Prune\u0026Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned using the prevalent block influence metric, Prune\u0026Comp nearly halves the perplexity and retains 93.19\\% of the original model's question-answering performance, outperforming the baseline by 4.01%.","short_abstract":"Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To add...","url_abs":"https://arxiv.org/abs/2507.18212","url_pdf":"https://arxiv.org/pdf/2507.18212v1","authors":"[\"Xinrui Chen\",\"Hongxing Zhang\",\"Fanyi Zeng\",\"Yongxian Wei\",\"Yizhi Wang\",\"Xitong Ling\",\"Guanghao Li\",\"Chun Yuan\"]","published":"2025-07-24T09:07:20Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
