{"ID":2921614,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-03T05:56:00.181519634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.01062","arxiv_id":"2606.01062","title":"DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts","abstract":"Mixture-of-Experts (MoE) models have become a leading approach for decoupling parameter count from computational cost in large language models, yet effectively scaling MoE performance remains a challenge. Prior work shows that fine-grained experts enlarge the space of expert combinations and improve flexibility, but they also impose substantial routing overhead, creating a new scalability bottleneck. In this paper, we explore a complementary axis for scaling -- how expert outputs are aggregated. We theoretically show that replacing the standard weighted-summation aggregation with structural aggregation expands the expert-combination space without altering the experts or router, and enables possible multi-step reasoning within a single MoE layer. To this end, we propose DAG-MoE, a sparse MoE framework that employs a lightweight module to automatically learn the optimal aggregation structure among the selected experts. Extensive experiments under standard language modeling settings show that DAG-MoE consistently improves performance in both pretraining and fine-tuning, surpassing traditional MoE baselines.","short_abstract":"Mixture-of-Experts (MoE) models have become a leading approach for decoupling parameter count from computational cost in large language models, yet effectively scaling MoE performance remains a challenge. Prior work shows that fine-grained experts enlarge the space of expert combinations and improve flexibility, but th...","url_abs":"https://arxiv.org/abs/2606.01062","url_pdf":"https://arxiv.org/pdf/2606.01062v1","authors":"[\"Jiarui Feng\",\"Hanqing Zeng\",\"Karish Grover\",\"Ruizhong Qiu\",\"Yinglong Xia\",\"Qiang Zhang\",\"Qifan Wang\",\"Ren Chen\",\"Dongqi Fu\",\"Jiayi Liu\",\"Zhoukai Zhao\",\"Xiangjun Fan\",\"Benyu Zhang\",\"Yixin Chen\"]","published":"2026-05-31T07:08:16Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
