{"ID":3084742,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T00:41:43.715680738Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05538","arxiv_id":"2606.05538","title":"Less is MoE: Trimming Experts in Domain-Specialist Language Models","abstract":"Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning. We trace this failure to the granularity of compression: important capabilities are distributed across experts but concentrated in FFN sparse intermediate dimensions. To identify these dimensions, we use Fisher importance which outperforms activation-, router-score-, and magnitude-based alternatives, and identifies tiny sets of task-critical dimensions: in Qwen1.5-MoE, removing as few as 12 of 1.35M routed-FFN intermediate dimensions collapses GSM8K accuracy while largely preserving factual-knowledge performance. Building on this, we propose Fisher-MoE, which operates within FFN to remove intermediate dimensions ranked by Fisher importance. At the same 50% MoE compression ratio, Fisher-MoE preserves model capability, while reducing weight memory by ~45% and improving inference throughput by 21%. These findings suggest intermediate dimension granularity is an effective unit for both compression and ranking where capability concentrates in MoE models.","short_abstract":"Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning. We trace this failure to the gran...","url_abs":"https://arxiv.org/abs/2606.05538","url_pdf":"https://arxiv.org/pdf/2606.05538v1","authors":"[\"Haoze He\",\"Xinkai Zou\",\"Xuan Jiang\",\"Xingyuan Ding\",\"Ao Qu\",\"Juncheng Billy Li\",\"Heather Miller\"]","published":"2026-06-04T00:43:20Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
