{"ID":2868914,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16105","arxiv_id":"2509.16105","title":"DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning","abstract":"Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degradation due to varying expert redundancy in different MoE layers. To address this, we propose a non-uniform pruning strategy, dubbed \\textbf{Di}fferentiable \\textbf{E}xpert \\textbf{P}runing (\\textbf{DiEP}), which adaptively adjusts pruning rates at the layer level while jointly learning inter-layer importance, effectively capturing the varying redundancy across different MoE layers. By transforming the global discrete search space into a continuous one, our method handles exponentially growing non-uniform expert combinations, enabling adaptive gradient-based pruning. Extensive experiments on five advanced MoE models demonstrate the efficacy of our method across various NLP tasks. Notably, \\textbf{DiEP} retains around 92\\% of original performance on Mixtral 8$\\times$7B with only half the experts, outperforming other pruning methods by up to 7.1\\% on the challenging MMLU dataset.","short_abstract":"Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degrada...","url_abs":"https://arxiv.org/abs/2509.16105","url_pdf":"https://arxiv.org/pdf/2509.16105v1","authors":"[\"Sikai Bai\",\"Haoxi Li\",\"Jie Zhang\",\"Zicong Hong\",\"Song Guo\"]","published":"2025-09-19T15:47:42Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
