{"ID":2883640,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07785","arxiv_id":"2508.07785","title":"Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts","abstract":"The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneous big.LITTLE CPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.","short_abstract":"The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters...","url_abs":"https://arxiv.org/abs/2508.07785","url_pdf":"https://arxiv.org/pdf/2508.07785v1","authors":"[\"Haoyuan Wu\",\"Haoxing Chen\",\"Xiaodong Chen\",\"Zhanchao Zhou\",\"Tieyuan Chen\",\"Yihong Zhuang\",\"Guoshan Lu\",\"Zenan Huang\",\"Junbo Zhao\",\"Lin Liu\",\"Zhenzhong Lan\",\"Bei Yu\",\"Jianguo Li\"]","published":"2025-08-11T09:15:36Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Mixture of Experts\",\"Large Language Model\",\"Language Model\"]","has_code":false}
