{"ID":2825805,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20291","arxiv_id":"2512.20291","title":"Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity","abstract":"Mixture-of-Experts (MoE) architectures achieve parameter efficiency through conditional computation, yet contemporary designs suffer from two fundamental limitations: structural parameter isolation that causes catastrophic forgetting, and instruction-overfitting that degrades performance in instruction-free scenarios. We propose CDSP-MoE (Conflict-Driven Subspace Pruning MoE), a framework that addresses these issues through a paradigm shift from isolated expert containers to dynamic expert instantiation within a shared physical subspace. Grounded in the Universal Weight Subspace Hypothesis, CDSP-MoE maintains a super-complete parameter backbone where logical experts are carved out via learnable topology masks. Unlike prior work that uses gradient conflict for token reassignment or optimization surgery, we leverage it as a structural supervisory signal: a Lagged Gradient Game penalizes interfering connections in the shared manifold, enabling the topology to spontaneously prune conflicting pathways and evolve interpretable modular structures. Experimental results demonstrate that CDSP-MoE achieves robust content-driven routing without human-defined task labels, maintaining semantic specialization even under strict blind inference protocols where explicit instructions are absent. Code is available at: https://github.com/konodiodaaaaa1/Conflict-Driven-Subspace-Pruning-Mixture-of-Experts","short_abstract":"Mixture-of-Experts (MoE) architectures achieve parameter efficiency through conditional computation, yet contemporary designs suffer from two fundamental limitations: structural parameter isolation that causes catastrophic forgetting, and instruction-overfitting that degrades performance in instruction-free scenarios....","url_abs":"https://arxiv.org/abs/2512.20291","url_pdf":"https://arxiv.org/pdf/2512.20291v4","authors":"[\"Yuxing Gan\",\"Ziyu Lei\"]","published":"2025-12-23T12:00:10Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false,"code_links":[{"ID":605700,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2825805,"paper_url":"https://arxiv.org/abs/2512.20291","paper_title":"Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity","repo_url":"https://github.com/konodiodaaaaa1/Conflict-Driven-Subspace-Pruning-Mixture-of-Experts","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
